Overview

Dataset statistics

Number of variables21
Number of observations45345
Missing cells53685
Missing cells (%)5.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory168.0 B

Variable types

Numeric9
Categorical11
DateTime1

Alerts

genres has a high cardinality: 4064 distinct valuesHigh cardinality
original_language has a high cardinality: 89 distinct valuesHigh cardinality
overview has a high cardinality: 44231 distinct valuesHigh cardinality
production_companies has a high cardinality: 22666 distinct valuesHigh cardinality
production_countries has a high cardinality: 2388 distinct valuesHigh cardinality
spoken_languages has a high cardinality: 1841 distinct valuesHigh cardinality
tagline has a high cardinality: 20269 distinct valuesHigh cardinality
title has a high cardinality: 42195 distinct valuesHigh cardinality
Director has a high cardinality: 18828 distinct valuesHigh cardinality
actores has a high cardinality: 42656 distinct valuesHigh cardinality
budget is highly overall correlated with revenue and 1 other fieldsHigh correlation
popularity is highly overall correlated with vote_countHigh correlation
revenue is highly overall correlated with budget and 2 other fieldsHigh correlation
vote_count is highly overall correlated with popularity and 1 other fieldsHigh correlation
return is highly overall correlated with budget and 1 other fieldsHigh correlation
original_language is highly imbalanced (67.4%)Imbalance
production_countries is highly imbalanced (57.7%)Imbalance
spoken_languages is highly imbalanced (62.0%)Imbalance
status is highly imbalanced (97.0%)Imbalance
genres has 2383 (5.3%) missing valuesMissing
overview has 941 (2.1%) missing valuesMissing
production_companies has 11788 (26.0%) missing valuesMissing
production_countries has 6207 (13.7%) missing valuesMissing
spoken_languages has 3888 (8.6%) missing valuesMissing
tagline has 24958 (55.0%) missing valuesMissing
Director has 835 (1.8%) missing valuesMissing
actores has 2348 (5.2%) missing valuesMissing
popularity is highly skewed (γ1 = 29.2152706)Skewed
return is highly skewed (γ1 = 138.2822621)Skewed
overview is uniformly distributedUniform
tagline is uniformly distributedUniform
title is uniformly distributedUniform
actores is uniformly distributedUniform
id has unique valuesUnique
budget has 36469 (80.4%) zerosZeros
revenue has 37948 (83.7%) zerosZeros
runtime has 1534 (3.4%) zerosZeros
vote_average has 2943 (6.5%) zerosZeros
vote_count has 2845 (6.3%) zerosZeros
return has 39970 (88.1%) zerosZeros

Reproduction

Analysis started2023-06-13 14:37:38.193667
Analysis finished2023-06-13 14:37:55.906253
Duration17.71 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

budget
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1223
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4232673.1
Minimum0
Maximum3.8 × 108
Zeros36469
Zeros (%)80.4%
Negative0
Negative (%)0.0%
Memory size354.4 KiB
2023-06-13T09:37:55.979272image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile25000000
Maximum3.8 × 108
Range3.8 × 108
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17443912
Coefficient of variation (CV)4.1212518
Kurtosis66.61671
Mean4232673.1
Median Absolute Deviation (MAD)0
Skewness7.1179263
Sum1.9193056 × 1011
Variance3.0429006 × 1014
MonotonicityNot monotonic
2023-06-13T09:37:56.089297image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 36469
80.4%
5000000 286
 
0.6%
10000000 258
 
0.6%
20000000 243
 
0.5%
2000000 242
 
0.5%
15000000 226
 
0.5%
3000000 223
 
0.5%
25000000 206
 
0.5%
1000000 197
 
0.4%
30000000 189
 
0.4%
Other values (1213) 6806
 
15.0%
ValueCountFrequency (%)
0 36469
80.4%
1 25
 
0.1%
2 14
 
< 0.1%
3 9
 
< 0.1%
4 7
 
< 0.1%
5 8
 
< 0.1%
6 5
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
380000000 1
 
< 0.1%
300000000 1
 
< 0.1%
280000000 1
 
< 0.1%
270000000 1
 
< 0.1%
260000000 3
 
< 0.1%
258000000 1
 
< 0.1%
255000000 1
 
< 0.1%
250000000 10
< 0.1%
245000000 2
 
< 0.1%
237000000 1
 
< 0.1%

genres
Categorical

HIGH CARDINALITY  MISSING 

Distinct4064
Distinct (%)9.5%
Missing2383
Missing (%)5.3%
Memory size354.4 KiB
Drama
4994 
Comedy
3620 
Documentary
 
2711
Drama, Romance
 
1300
Comedy, Drama
 
1133
Other values (4059)
29204 

Length

Max length80
Median length65
Mean length16.459243
Min length3

Characters and Unicode

Total characters707122
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2370 ?
Unique (%)5.5%

Sample

1st rowAnimation, Comedy, Family
2nd rowAdventure, Fantasy, Family
3rd rowRomance, Comedy
4th rowComedy, Drama, Romance
5th rowComedy

Common Values

ValueCountFrequency (%)
Drama 4994
 
11.0%
Comedy 3620
 
8.0%
Documentary 2711
 
6.0%
Drama, Romance 1300
 
2.9%
Comedy, Drama 1133
 
2.5%
Horror 974
 
2.1%
Comedy, Romance 930
 
2.1%
Comedy, Drama, Romance 593
 
1.3%
Drama, Comedy 531
 
1.2%
Horror, Thriller 528
 
1.2%
Other values (4054) 25648
56.6%
(Missing) 2383
 
5.3%

Length

2023-06-13T09:37:56.227400image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama 20234
21.4%
comedy 13175
13.9%
thriller 7614
 
8.0%
romance 6728
 
7.1%
action 6588
 
7.0%
horror 4668
 
4.9%
crime 4302
 
4.5%
documentary 3919
 
4.1%
adventure 3488
 
3.7%
science 3037
 
3.2%
Other values (12) 21006
22.2%

Most occurring characters

ValueCountFrequency (%)
r 69015
 
9.8%
a 61748
 
8.7%
e 55716
 
7.9%
m 53051
 
7.5%
51797
 
7.3%
o 48491
 
6.9%
, 47995
 
6.8%
i 39613
 
5.6%
n 35622
 
5.0%
y 28487
 
4.0%
Other values (20) 215587
30.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 511806
72.4%
Uppercase Letter 95524
 
13.5%
Space Separator 51797
 
7.3%
Other Punctuation 47995
 
6.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 69015
13.5%
a 61748
12.1%
e 55716
10.9%
m 53051
10.4%
o 48491
9.5%
i 39613
7.7%
n 35622
7.0%
y 28487
5.6%
c 27943
5.5%
t 26169
 
5.1%
Other values (7) 65951
12.9%
Uppercase Letter
ValueCountFrequency (%)
D 24153
25.3%
C 17477
18.3%
A 12004
12.6%
F 9729
10.2%
T 8379
 
8.8%
R 6728
 
7.0%
H 6065
 
6.3%
M 4823
 
5.0%
S 3037
 
3.2%
W 2364
 
2.5%
Space Separator
ValueCountFrequency (%)
51797
100.0%
Other Punctuation
ValueCountFrequency (%)
, 47995
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 607330
85.9%
Common 99792
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 69015
11.4%
a 61748
 
10.2%
e 55716
 
9.2%
m 53051
 
8.7%
o 48491
 
8.0%
i 39613
 
6.5%
n 35622
 
5.9%
y 28487
 
4.7%
c 27943
 
4.6%
t 26169
 
4.3%
Other values (18) 161475
26.6%
Common
ValueCountFrequency (%)
51797
51.9%
, 47995
48.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 707122
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 69015
 
9.8%
a 61748
 
8.7%
e 55716
 
7.9%
m 53051
 
7.5%
51797
 
7.3%
o 48491
 
6.9%
, 47995
 
6.8%
i 39613
 
5.6%
n 35622
 
5.0%
y 28487
 
4.0%
Other values (20) 215587
30.5%

id
Real number (ℝ)

Distinct45345
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean108035.74
Minimum2
Maximum469172
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size354.4 KiB
2023-06-13T09:37:56.345426image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile5339.4
Q126390
median59852
Q3156597
95-th percentile357281.4
Maximum469172
Range469170
Interquartile range (IQR)130207

Descriptive statistics

Standard deviation112180.08
Coefficient of variation (CV)1.0383608
Kurtosis0.55862472
Mean108035.74
Median Absolute Deviation (MAD)44403
Skewness1.2828995
Sum4.8988807 × 109
Variance1.258437 × 1010
MonotonicityNot monotonic
2023-06-13T09:37:56.453451image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
862 1
 
< 0.1%
202198 1
 
< 0.1%
124026 1
 
< 0.1%
300168 1
 
< 0.1%
132316 1
 
< 0.1%
74458 1
 
< 0.1%
40777 1
 
< 0.1%
188222 1
 
< 0.1%
328483 1
 
< 0.1%
107637 1
 
< 0.1%
Other values (45335) 45335
> 99.9%
ValueCountFrequency (%)
2 1
< 0.1%
3 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
15 1
< 0.1%
16 1
< 0.1%
ValueCountFrequency (%)
469172 1
< 0.1%
468707 1
< 0.1%
468343 1
< 0.1%
467731 1
< 0.1%
465044 1
< 0.1%
464819 1
< 0.1%
464207 1
< 0.1%
464111 1
< 0.1%
463906 1
< 0.1%
463800 1
< 0.1%

original_language
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct89
Distinct (%)0.2%
Missing11
Missing (%)< 0.1%
Memory size354.4 KiB
en
32184 
fr
 
2435
it
 
1528
ja
 
1346
de
 
1077
Other values (84)
6764 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters90668
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en 32184
71.0%
fr 2435
 
5.4%
it 1528
 
3.4%
ja 1346
 
3.0%
de 1077
 
2.4%
es 991
 
2.2%
ru 822
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 408
 
0.9%
Other values (79) 3591
 
7.9%

Length

2023-06-13T09:37:56.546811image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en 32184
71.0%
fr 2435
 
5.4%
it 1528
 
3.4%
ja 1346
 
3.0%
de 1077
 
2.4%
es 991
 
2.2%
ru 822
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 408
 
0.9%
Other values (79) 3591
 
7.9%

Most occurring characters

ValueCountFrequency (%)
e 34507
38.1%
n 32892
36.3%
r 3628
 
4.0%
f 2830
 
3.1%
i 2386
 
2.6%
t 2249
 
2.5%
a 1834
 
2.0%
s 1650
 
1.8%
j 1347
 
1.5%
d 1321
 
1.5%
Other values (16) 6024
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90668
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 34507
38.1%
n 32892
36.3%
r 3628
 
4.0%
f 2830
 
3.1%
i 2386
 
2.6%
t 2249
 
2.5%
a 1834
 
2.0%
s 1650
 
1.8%
j 1347
 
1.5%
d 1321
 
1.5%
Other values (16) 6024
 
6.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 90668
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 34507
38.1%
n 32892
36.3%
r 3628
 
4.0%
f 2830
 
3.1%
i 2386
 
2.6%
t 2249
 
2.5%
a 1834
 
2.0%
s 1650
 
1.8%
j 1347
 
1.5%
d 1321
 
1.5%
Other values (16) 6024
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90668
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 34507
38.1%
n 32892
36.3%
r 3628
 
4.0%
f 2830
 
3.1%
i 2386
 
2.6%
t 2249
 
2.5%
a 1834
 
2.0%
s 1650
 
1.8%
j 1347
 
1.5%
d 1321
 
1.5%
Other values (16) 6024
 
6.6%

overview
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct44231
Distinct (%)99.6%
Missing941
Missing (%)2.1%
Memory size354.4 KiB
No overview found.
 
133
No Overview
 
7
 
5
No movie overview available.
 
3
A few funny little novels about different aspects of life.
 
3
Other values (44226)
44253 

Length

Max length1000
Median length786
Mean length323.26504
Min length1

Characters and Unicode

Total characters14354261
Distinct characters429
Distinct categories25 ?
Distinct scripts13 ?
Distinct blocks21 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44200 ?
Unique (%)99.5%

Sample

1st rowLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.
2nd rowWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.
3rd rowA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.
4th rowCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.
5th rowJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.

Common Values

ValueCountFrequency (%)
No overview found. 133
 
0.3%
No Overview 7
 
< 0.1%
5
 
< 0.1%
No movie overview available. 3
 
< 0.1%
A few funny little novels about different aspects of life. 3
 
< 0.1%
Adaptation of the Jane Austen novel. 3
 
< 0.1%
Funny, entertaining comedy with a few storylines. All of them have one thing in common - a resort town of Rimini in Italy. 2
 
< 0.1%
When four women move into an old house left by one woman's aunt, strange things begin to happen. Bizarre voices, visions of ghosts, and mysterious noises lead them to discover the darkest powers of evil and a horror and agony beyond terror. 2
 
< 0.1%
Alien pods come to Earth and, naturally, start taking over Human Hosts. One such pod only manages to take over one human's, Shin Izumi, right arm. Together they grow and co-exist, all the while the other aliens are making meals of other humans; Shin feels he must put a stop to it all, but his alien, Migi, doesn't see why. 2
 
< 0.1%
The ghost of a samurai's wife takes revenge on her husband. 2
 
< 0.1%
Other values (44221) 44242
97.6%
(Missing) 941
 
2.1%

Length

2023-06-13T09:37:56.657839image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 137966
 
5.6%
a 98822
 
4.0%
and 75193
 
3.1%
to 73260
 
3.0%
of 69523
 
2.8%
in 48107
 
2.0%
is 36478
 
1.5%
his 36130
 
1.5%
with 23880
 
1.0%
her 21460
 
0.9%
Other values (97091) 1825938
74.6%

Most occurring characters

ValueCountFrequency (%)
2404430
16.8%
e 1362728
 
9.5%
a 939715
 
6.5%
t 934049
 
6.5%
i 850832
 
5.9%
o 829241
 
5.8%
n 821942
 
5.7%
s 767214
 
5.3%
r 743641
 
5.2%
h 600332
 
4.2%
Other values (419) 4100137
28.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11141168
77.6%
Space Separator 2404468
 
16.8%
Uppercase Letter 390650
 
2.7%
Other Punctuation 312579
 
2.2%
Decimal Number 42192
 
0.3%
Dash Punctuation 36745
 
0.3%
Close Punctuation 10094
 
0.1%
Open Punctuation 10071
 
0.1%
Final Punctuation 4549
 
< 0.1%
Initial Punctuation 880
 
< 0.1%
Other values (15) 865
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1362728
12.2%
a 939715
 
8.4%
t 934049
 
8.4%
i 850832
 
7.6%
o 829241
 
7.4%
n 821942
 
7.4%
s 767214
 
6.9%
r 743641
 
6.7%
h 600332
 
5.4%
l 478415
 
4.3%
Other values (142) 2813059
25.2%
Uppercase Letter
ValueCountFrequency (%)
A 42722
 
10.9%
T 35939
 
9.2%
S 31102
 
8.0%
M 23942
 
6.1%
B 23679
 
6.1%
C 22771
 
5.8%
H 19415
 
5.0%
W 18633
 
4.8%
I 16782
 
4.3%
D 16306
 
4.2%
Other values (77) 139359
35.7%
Other Letter
ValueCountFrequency (%)
6
 
4.8%
6
 
4.8%
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
2
 
1.6%
2
 
1.6%
Other values (76) 88
70.4%
Other Punctuation
ValueCountFrequency (%)
, 133326
42.7%
. 124702
39.9%
' 31096
 
9.9%
" 11660
 
3.7%
: 3294
 
1.1%
? 2759
 
0.9%
; 2492
 
0.8%
! 1540
 
0.5%
/ 765
 
0.2%
& 452
 
0.1%
Other values (12) 493
 
0.2%
Nonspacing Mark
ValueCountFrequency (%)
́ 4
12.1%
ి 4
12.1%
3
9.1%
̈ 3
9.1%
3
9.1%
3
9.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
Other values (4) 5
15.2%
Decimal Number
ValueCountFrequency (%)
1 9738
23.1%
0 8262
19.6%
9 6399
15.2%
2 4249
10.1%
5 2439
 
5.8%
8 2378
 
5.6%
3 2338
 
5.5%
4 2173
 
5.2%
7 2131
 
5.1%
6 2085
 
4.9%
Spacing Mark
ValueCountFrequency (%)
11
40.7%
4
 
14.8%
3
 
11.1%
3
 
11.1%
ि 2
 
7.4%
2
 
7.4%
1
 
3.7%
ி 1
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 35222
95.9%
881
 
2.4%
633
 
1.7%
5
 
< 0.1%
4
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
® 45
70.3%
14
 
21.9%
¦ 2
 
3.1%
° 2
 
3.1%
1
 
1.6%
Math Symbol
ValueCountFrequency (%)
~ 20
50.0%
+ 11
27.5%
= 6
 
15.0%
| 2
 
5.0%
1
 
2.5%
Open Punctuation
ValueCountFrequency (%)
( 10018
99.5%
[ 50
 
0.5%
{ 2
 
< 0.1%
1
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$ 317
96.4%
£ 10
 
3.0%
1
 
0.3%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
2404430
> 99.9%
  36
 
< 0.1%
  2
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 10042
99.5%
] 50
 
0.5%
} 2
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
3842
84.5%
688
 
15.1%
» 19
 
0.4%
Initial Punctuation
ValueCountFrequency (%)
670
76.1%
192
 
21.8%
« 18
 
2.0%
Control
ValueCountFrequency (%)
106
96.4%
’ 3
 
2.7%
 1
 
0.9%
Modifier Symbol
ValueCountFrequency (%)
´ 25
65.8%
` 12
31.6%
¯ 1
 
2.6%
Format
ValueCountFrequency (%)
31
60.8%
­ 20
39.2%
Other Number
ValueCountFrequency (%)
¹ 8
50.0%
½ 8
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19
100.0%
Line Separator
ValueCountFrequency (%)
7
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%
Paragraph Separator
ValueCountFrequency (%)
2
100.0%
Modifier Letter
ValueCountFrequency (%)
ʼ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11526586
80.3%
Common 2822256
 
19.7%
Cyrillic 4587
 
< 0.1%
Greek 648
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Han 10
 
< 0.1%
Hangul 9
 
< 0.1%
Other values (3) 19
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1362728
11.8%
a 939715
 
8.2%
t 934049
 
8.1%
i 850832
 
7.4%
o 829241
 
7.2%
n 821942
 
7.1%
s 767214
 
6.7%
r 743641
 
6.5%
h 600332
 
5.2%
l 478415
 
4.2%
Other values (132) 3198477
27.7%
Common
ValueCountFrequency (%)
2404430
85.2%
, 133326
 
4.7%
. 124702
 
4.4%
- 35222
 
1.2%
' 31096
 
1.1%
" 11660
 
0.4%
) 10042
 
0.4%
( 10018
 
0.4%
1 9738
 
0.3%
0 8262
 
0.3%
Other values (71) 43760
 
1.6%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Greek
ValueCountFrequency (%)
α 60
 
9.3%
ο 55
 
8.5%
τ 43
 
6.6%
η 36
 
5.6%
ι 36
 
5.6%
ν 34
 
5.2%
ε 31
 
4.8%
ρ 31
 
4.8%
ς 30
 
4.6%
π 30
 
4.6%
Other values (33) 262
40.4%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Han
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Inherited
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14336281
99.9%
Punctuation 7261
 
0.1%
None 5921
 
< 0.1%
Cyrillic 4587
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Letterlike Symbols 14
 
< 0.1%
CJK 10
 
< 0.1%
Other values (11) 41
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2404430
16.8%
e 1362728
 
9.5%
a 939715
 
6.6%
t 934049
 
6.5%
i 850832
 
5.9%
o 829241
 
5.8%
n 821942
 
5.7%
s 767214
 
5.4%
r 743641
 
5.2%
h 600332
 
4.2%
Other values (82) 4082157
28.5%
Punctuation
ValueCountFrequency (%)
3842
52.9%
881
 
12.1%
688
 
9.5%
670
 
9.2%
633
 
8.7%
303
 
4.2%
192
 
2.6%
31
 
0.4%
7
 
0.1%
5
 
0.1%
Other values (4) 9
 
0.1%
None
ValueCountFrequency (%)
é 1544
26.1%
ä 294
 
5.0%
á 293
 
4.9%
ö 250
 
4.2%
í 243
 
4.1%
è 209
 
3.5%
ü 178
 
3.0%
ı 165
 
2.8%
ó 164
 
2.8%
ç 158
 
2.7%
Other values (141) 2423
40.9%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Letterlike Symbols
ValueCountFrequency (%)
14
100.0%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Alphabetic PF
ValueCountFrequency (%)
4
100.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Diacriticals
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Number Forms
ValueCountFrequency (%)
2
100.0%
Modifier Letters
ValueCountFrequency (%)
ʼ 2
100.0%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
CJK
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Math Operators
ValueCountFrequency (%)
1
100.0%
Katakana
ValueCountFrequency (%)
1
100.0%
Currency Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Specials
ValueCountFrequency (%)
1
100.0%

popularity
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct43718
Distinct (%)96.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.926248
Minimum0
Maximum547.4883
Zeros40
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size354.4 KiB
2023-06-13T09:37:56.774866image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.020821
Q10.388826
median1.130269
Q33.68961
95-th percentile11.06377
Maximum547.4883
Range547.4883
Interquartile range (IQR)3.300784

Descriptive statistics

Standard deviation6.0110226
Coefficient of variation (CV)2.054174
Kurtosis1923.277
Mean2.926248
Median Absolute Deviation (MAD)0.967345
Skewness29.215271
Sum132690.71
Variance36.132393
MonotonicityNot monotonic
2023-06-13T09:37:56.874889image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 × 10-656
 
0.1%
0.000308 42
 
0.1%
0 40
 
0.1%
0.00022 39
 
0.1%
0.000578 38
 
0.1%
0.001177 38
 
0.1%
0.000844 38
 
0.1%
0.002001 27
 
0.1%
0.003013 21
 
< 0.1%
0.00353 19
 
< 0.1%
Other values (43708) 44987
99.2%
ValueCountFrequency (%)
0 40
0.1%
1 × 10-656
0.1%
2 × 10-66
 
< 0.1%
3 × 10-66
 
< 0.1%
4 × 10-65
 
< 0.1%
5 × 10-61
 
< 0.1%
6 × 10-62
 
< 0.1%
7 × 10-61
 
< 0.1%
8 × 10-66
 
< 0.1%
9 × 10-62
 
< 0.1%
ValueCountFrequency (%)
547.488298 1
< 0.1%
294.337037 1
< 0.1%
287.253654 1
< 0.1%
228.032744 1
< 0.1%
213.849907 1
< 0.1%
187.860492 1
< 0.1%
185.330992 1
< 0.1%
185.070892 1
< 0.1%
183.870374 1
< 0.1%
154.801009 1
< 0.1%

production_companies
Categorical

HIGH CARDINALITY  MISSING 

Distinct22666
Distinct (%)67.5%
Missing11788
Missing (%)26.0%
Memory size354.4 KiB
Metro-Goldwyn-Mayer (MGM)
 
742
Warner Bros.
 
540
Paramount Pictures
 
504
Twentieth Century Fox Film Corporation
 
439
Universal Pictures
 
320
Other values (22661)
31012 

Length

Max length609
Median length412
Mean length41.476532
Min length2

Characters and Unicode

Total characters1391828
Distinct characters294
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20318 ?
Unique (%)60.5%

Sample

1st rowPixar Animation Studios
2nd rowTriStar Pictures, Teitler Film, Interscope Communications
3rd rowWarner Bros., Lancaster Gate
4th rowTwentieth Century Fox Film Corporation
5th rowSandollar Productions, Touchstone Pictures

Common Values

ValueCountFrequency (%)
Metro-Goldwyn-Mayer (MGM) 742
 
1.6%
Warner Bros. 540
 
1.2%
Paramount Pictures 504
 
1.1%
Twentieth Century Fox Film Corporation 439
 
1.0%
Universal Pictures 320
 
0.7%
RKO Radio Pictures 247
 
0.5%
Columbia Pictures Corporation 207
 
0.5%
Columbia Pictures 146
 
0.3%
Mosfilm 145
 
0.3%
Walt Disney Pictures 85
 
0.2%
Other values (22656) 30182
66.6%
(Missing) 11788
 
26.0%

Length

2023-06-13T09:37:56.996916image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
films 9447
 
5.3%
pictures 9262
 
5.2%
productions 9054
 
5.1%
film 6669
 
3.8%
entertainment 5152
 
2.9%
corporation 2189
 
1.2%
company 1767
 
1.0%
warner 1478
 
0.8%
bros 1411
 
0.8%
the 1378
 
0.8%
Other values (18616) 129677
73.1%

Most occurring characters

ValueCountFrequency (%)
143936
 
10.3%
i 106827
 
7.7%
e 94541
 
6.8%
n 89879
 
6.5%
o 85214
 
6.1%
r 83473
 
6.0%
t 83347
 
6.0%
a 77037
 
5.5%
s 62615
 
4.5%
l 51198
 
3.7%
Other values (284) 513761
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 985972
70.8%
Uppercase Letter 198744
 
14.3%
Space Separator 143941
 
10.3%
Other Punctuation 45024
 
3.2%
Decimal Number 4336
 
0.3%
Dash Punctuation 4325
 
0.3%
Open Punctuation 4321
 
0.3%
Close Punctuation 4320
 
0.3%
Math Symbol 660
 
< 0.1%
Other Letter 140
 
< 0.1%
Other values (7) 45
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 106827
10.8%
e 94541
9.6%
n 89879
9.1%
o 85214
8.6%
r 83473
8.5%
t 83347
8.5%
a 77037
 
7.8%
s 62615
 
6.4%
l 51198
 
5.2%
m 44226
 
4.5%
Other values (102) 207615
21.1%
Other Letter
ValueCountFrequency (%)
9
 
6.4%
8
 
5.7%
6
 
4.3%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
4
 
2.9%
3
 
2.1%
Other values (62) 85
60.7%
Uppercase Letter
ValueCountFrequency (%)
P 27858
14.0%
F 26326
13.2%
C 20552
 
10.3%
M 13342
 
6.7%
S 11899
 
6.0%
E 9742
 
4.9%
A 9532
 
4.8%
T 9344
 
4.7%
B 8995
 
4.5%
G 7802
 
3.9%
Other values (52) 53352
26.8%
Other Punctuation
ValueCountFrequency (%)
, 37294
82.8%
. 5660
 
12.6%
& 764
 
1.7%
/ 642
 
1.4%
' 450
 
1.0%
" 133
 
0.3%
! 36
 
0.1%
% 18
 
< 0.1%
: 9
 
< 0.1%
@ 5
 
< 0.1%
Other values (6) 13
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 1032
23.8%
1 710
16.4%
0 639
14.7%
3 555
12.8%
4 480
11.1%
9 205
 
4.7%
6 194
 
4.5%
5 177
 
4.1%
8 173
 
4.0%
7 171
 
3.9%
Open Punctuation
ValueCountFrequency (%)
( 4311
99.8%
[ 9
 
0.2%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 4310
99.8%
] 9
 
0.2%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
143936
> 99.9%
  5
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 4323
> 99.9%
2
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 659
99.8%
| 1
 
0.2%
Other Symbol
ValueCountFrequency (%)
° 23
92.0%
2
 
8.0%
Final Punctuation
ValueCountFrequency (%)
3
50.0%
» 3
50.0%
Other Number
ValueCountFrequency (%)
² 1
50.0%
½ 1
50.0%
Control
ValueCountFrequency (%)
4
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%
Initial Punctuation
ValueCountFrequency (%)
« 3
100.0%
Format
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1184313
85.1%
Common 206970
 
14.9%
Cyrillic 373
 
< 0.1%
Hangul 115
 
< 0.1%
Greek 31
 
< 0.1%
Han 26
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 106827
 
9.0%
e 94541
 
8.0%
n 89879
 
7.6%
o 85214
 
7.2%
r 83473
 
7.0%
t 83347
 
7.0%
a 77037
 
6.5%
s 62615
 
5.3%
l 51198
 
4.3%
m 44226
 
3.7%
Other values (99) 405956
34.3%
Hangul
ValueCountFrequency (%)
9
 
7.8%
8
 
7.0%
6
 
5.2%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
4
 
3.5%
3
 
2.6%
Other values (43) 60
52.2%
Common
ValueCountFrequency (%)
143936
69.5%
, 37294
 
18.0%
. 5660
 
2.7%
- 4323
 
2.1%
( 4311
 
2.1%
) 4310
 
2.1%
2 1032
 
0.5%
& 764
 
0.4%
1 710
 
0.3%
+ 659
 
0.3%
Other values (37) 3971
 
1.9%
Cyrillic
ValueCountFrequency (%)
и 34
 
9.1%
о 28
 
7.5%
а 26
 
7.0%
л 22
 
5.9%
н 20
 
5.4%
м 19
 
5.1%
т 17
 
4.6%
е 16
 
4.3%
ь 16
 
4.3%
с 16
 
4.3%
Other values (36) 159
42.6%
Greek
ValueCountFrequency (%)
ο 3
 
9.7%
ν 3
 
9.7%
Ε 2
 
6.5%
λ 2
 
6.5%
η 2
 
6.5%
ρ 2
 
6.5%
τ 2
 
6.5%
ι 2
 
6.5%
Κ 2
 
6.5%
κ 1
 
3.2%
Other values (10) 10
32.3%
Han
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1385606
99.6%
None 5703
 
0.4%
Cyrillic 373
 
< 0.1%
Hangul 113
 
< 0.1%
CJK 26
 
< 0.1%
Punctuation 7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
143936
 
10.4%
i 106827
 
7.7%
e 94541
 
6.8%
n 89879
 
6.5%
o 85214
 
6.1%
r 83473
 
6.0%
t 83347
 
6.0%
a 77037
 
5.6%
s 62615
 
4.5%
l 51198
 
3.7%
Other values (77) 507539
36.6%
None
ValueCountFrequency (%)
é 3171
55.6%
ó 416
 
7.3%
á 317
 
5.6%
í 173
 
3.0%
ü 154
 
2.7%
ñ 150
 
2.6%
ô 140
 
2.5%
è 136
 
2.4%
ä 136
 
2.4%
ö 131
 
2.3%
Other values (76) 779
 
13.7%
Cyrillic
ValueCountFrequency (%)
и 34
 
9.1%
о 28
 
7.5%
а 26
 
7.0%
л 22
 
5.9%
н 20
 
5.4%
м 19
 
5.1%
т 17
 
4.6%
е 16
 
4.3%
ь 16
 
4.3%
с 16
 
4.3%
Other values (36) 159
42.6%
Hangul
ValueCountFrequency (%)
9
 
8.0%
8
 
7.1%
6
 
5.3%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
4
 
3.5%
3
 
2.7%
Other values (42) 58
51.3%
Punctuation
ValueCountFrequency (%)
3
42.9%
2
28.6%
1
 
14.3%
1
 
14.3%
CJK
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

production_countries
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct2388
Distinct (%)6.1%
Missing6207
Missing (%)13.7%
Memory size354.4 KiB
US
17836 
GB
2235 
FR
 
1652
JP
 
1354
IT
 
1029
Other values (2383)
15032 

Length

Max length98
Median length2
Mean length3.0439982
Min length2

Characters and Unicode

Total characters119136
Distinct characters28
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1769 ?
Unique (%)4.5%

Sample

1st rowUS
2nd rowUS
3rd rowUS
4th rowUS
5th rowUS

Common Values

ValueCountFrequency (%)
US 17836
39.3%
GB 2235
 
4.9%
FR 1652
 
3.6%
JP 1354
 
3.0%
IT 1029
 
2.3%
CA 840
 
1.9%
DE 748
 
1.6%
IN 735
 
1.6%
RU 734
 
1.6%
GB, US 569
 
1.3%
Other values (2378) 11406
25.2%
(Missing) 6207
 
13.7%

Length

2023-06-13T09:37:57.109942image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
us 21134
42.8%
gb 4088
 
8.3%
fr 3931
 
8.0%
de 2249
 
4.6%
it 2165
 
4.4%
ca 1765
 
3.6%
jp 1645
 
3.3%
es 964
 
2.0%
ru 911
 
1.8%
in 826
 
1.7%
Other values (151) 9675
19.6%

Most occurring characters

ValueCountFrequency (%)
S 23026
19.3%
U 23010
19.3%
, 10215
8.6%
10215
8.6%
R 6676
 
5.6%
B 4978
 
4.2%
E 4744
 
4.0%
G 4445
 
3.7%
F 4332
 
3.6%
I 4002
 
3.4%
Other values (18) 23493
19.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 98706
82.9%
Other Punctuation 10215
 
8.6%
Space Separator 10215
 
8.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 23026
23.3%
U 23010
23.3%
R 6676
 
6.8%
B 4978
 
5.0%
E 4744
 
4.8%
G 4445
 
4.5%
F 4332
 
4.4%
I 4002
 
4.1%
A 3135
 
3.2%
T 3002
 
3.0%
Other values (16) 17356
17.6%
Other Punctuation
ValueCountFrequency (%)
, 10215
100.0%
Space Separator
ValueCountFrequency (%)
10215
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 98706
82.9%
Common 20430
 
17.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 23026
23.3%
U 23010
23.3%
R 6676
 
6.8%
B 4978
 
5.0%
E 4744
 
4.8%
G 4445
 
4.5%
F 4332
 
4.4%
I 4002
 
4.1%
A 3135
 
3.2%
T 3002
 
3.0%
Other values (16) 17356
17.6%
Common
ValueCountFrequency (%)
, 10215
50.0%
10215
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 119136
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 23026
19.3%
U 23010
19.3%
, 10215
8.6%
10215
8.6%
R 6676
 
5.6%
B 4978
 
4.2%
E 4744
 
4.0%
G 4445
 
3.7%
F 4332
 
3.6%
I 4002
 
3.4%
Other values (18) 23493
19.7%
Distinct17333
Distinct (%)38.2%
Missing0
Missing (%)0.0%
Memory size354.4 KiB
Minimum1874-12-09 00:00:00
Maximum2020-12-16 00:00:00
2023-06-13T09:37:57.210964image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:57.319151image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

revenue
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct6863
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11125061
Minimum-2.1474836 × 109
Maximum2.0682236 × 109
Zeros37948
Zeros (%)83.7%
Negative1
Negative (%)< 0.1%
Memory size354.4 KiB
2023-06-13T09:37:57.428176image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-2.1474836 × 109
5-th percentile0
Q10
median0
Q30
95-th percentile48013922
Maximum2.0682236 × 109
Range4.2157073 × 109
Interquartile range (IQR)0

Descriptive statistics

Standard deviation63886181
Coefficient of variation (CV)5.7425467
Kurtosis195.01472
Mean11125061
Median Absolute Deviation (MAD)0
Skewness9.8986957
Sum5.0446588 × 1011
Variance4.0814441 × 1015
MonotonicityNot monotonic
2023-06-13T09:37:57.533200image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 37948
83.7%
12000000 20
 
< 0.1%
11000000 19
 
< 0.1%
10000000 19
 
< 0.1%
2000000 18
 
< 0.1%
6000000 17
 
< 0.1%
5000000 14
 
< 0.1%
8000000 13
 
< 0.1%
500000 13
 
< 0.1%
1 12
 
< 0.1%
Other values (6853) 7252
 
16.0%
ValueCountFrequency (%)
-2147483648 1
 
< 0.1%
0 37948
83.7%
1 12
 
< 0.1%
2 3
 
< 0.1%
3 9
 
< 0.1%
4 4
 
< 0.1%
5 5
 
< 0.1%
6 2
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
ValueCountFrequency (%)
2068223624 1
< 0.1%
1845034188 1
< 0.1%
1519557910 1
< 0.1%
1513528810 1
< 0.1%
1506249360 1
< 0.1%
1405403694 1
< 0.1%
1342000000 1
< 0.1%
1274219009 1
< 0.1%
1262886337 1
< 0.1%
1238764765 1
< 0.1%

runtime
Real number (ℝ)

Distinct353
Distinct (%)0.8%
Missing246
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean94.179893
Minimum0
Maximum1256
Zeros1534
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size354.4 KiB
2023-06-13T09:37:57.637223image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile12
Q185
median95
Q3107
95-th percentile138
Maximum1256
Range1256
Interquartile range (IQR)22

Descriptive statistics

Standard deviation38.346636
Coefficient of variation (CV)0.40716372
Kurtosis93.935366
Mean94.179893
Median Absolute Deviation (MAD)11
Skewness4.4921262
Sum4247419
Variance1470.4645
MonotonicityNot monotonic
2023-06-13T09:37:57.747248image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90 2548
 
5.6%
0 1534
 
3.4%
100 1470
 
3.2%
95 1409
 
3.1%
93 1212
 
2.7%
96 1104
 
2.4%
92 1078
 
2.4%
94 1061
 
2.3%
91 1055
 
2.3%
88 1030
 
2.3%
Other values (343) 31598
69.7%
ValueCountFrequency (%)
0 1534
3.4%
1 107
 
0.2%
2 33
 
0.1%
3 48
 
0.1%
4 50
 
0.1%
5 51
 
0.1%
6 72
 
0.2%
7 103
 
0.2%
8 78
 
0.2%
9 63
 
0.1%
ValueCountFrequency (%)
1256 1
< 0.1%
1140 2
< 0.1%
931 1
< 0.1%
925 1
< 0.1%
900 1
< 0.1%
877 1
< 0.1%
874 1
< 0.1%
840 2
< 0.1%
780 1
< 0.1%
720 1
< 0.1%

spoken_languages
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct1841
Distinct (%)4.4%
Missing3888
Missing (%)8.6%
Memory size354.4 KiB
English
22366 
Français
 
1850
日本語
 
1287
Italiano
 
1217
Español
 
901
Other values (1836)
13836 

Length

Max length171
Median length7
Mean length9.3985817
Min length2

Characters and Unicode

Total characters389637
Distinct characters171
Distinct categories8 ?
Distinct scripts15 ?
Distinct blocks16 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1294 ?
Unique (%)3.1%

Sample

1st rowEnglish
2nd rowEnglish, Français
3rd rowEnglish
4th rowEnglish
5th rowEnglish

Common Values

ValueCountFrequency (%)
English 22366
49.3%
Français 1850
 
4.1%
日本語 1287
 
2.8%
Italiano 1217
 
2.7%
Español 901
 
2.0%
Pусский 807
 
1.8%
Deutsch 760
 
1.7%
English, Français 681
 
1.5%
English, Español 572
 
1.3%
हिन्दी 480
 
1.1%
Other values (1831) 10536
23.2%
(Missing) 3888
 
8.6%

Length

2023-06-13T09:37:57.873437image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
english 28711
52.8%
français 4191
 
7.7%
deutsch 2623
 
4.8%
español 2412
 
4.4%
italiano 2365
 
4.4%
日本語 1756
 
3.2%
pусский 1562
 
2.9%
普通话 790
 
1.5%
हिन्दी 706
 
1.3%
663
 
1.2%
Other values (69) 8557
 
15.7%

Most occurring characters

ValueCountFrequency (%)
s 42241
10.8%
n 37438
 
9.6%
i 37085
 
9.5%
l 34612
 
8.9%
h 31440
 
8.1%
E 31180
 
8.0%
g 30395
 
7.8%
a 18936
 
4.9%
13073
 
3.4%
, 11660
 
3.0%
Other values (161) 101577
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 291857
74.9%
Uppercase Letter 46402
 
11.9%
Other Letter 22170
 
5.7%
Space Separator 13073
 
3.4%
Other Punctuation 12725
 
3.3%
Spacing Mark 1836
 
0.5%
Nonspacing Mark 1548
 
0.4%
Control 26
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 42241
14.5%
n 37438
12.8%
i 37085
12.7%
l 34612
11.9%
h 31440
10.8%
g 30395
10.4%
a 18936
6.5%
o 7049
 
2.4%
r 6124
 
2.1%
t 5975
 
2.0%
Other values (63) 40562
13.9%
Other Letter
ValueCountFrequency (%)
1756
 
7.9%
1756
 
7.9%
1756
 
7.9%
1263
 
5.7%
946
 
4.3%
790
 
3.6%
790
 
3.6%
706
 
3.2%
706
 
3.2%
706
 
3.2%
Other values (46) 10995
49.6%
Uppercase Letter
ValueCountFrequency (%)
E 31180
67.2%
F 4193
 
9.0%
D 2924
 
6.3%
P 2677
 
5.8%
I 2365
 
5.1%
N 828
 
1.8%
L 505
 
1.1%
M 362
 
0.8%
T 308
 
0.7%
Č 283
 
0.6%
Other values (13) 777
 
1.7%
Spacing Mark
ValueCountFrequency (%)
706
38.5%
ि 706
38.5%
136
 
7.4%
ி 111
 
6.0%
94
 
5.1%
47
 
2.6%
18
 
1.0%
18
 
1.0%
Nonspacing Mark
ValueCountFrequency (%)
706
45.6%
ִ 430
27.8%
ְ 215
 
13.9%
111
 
7.2%
68
 
4.4%
18
 
1.2%
Other Punctuation
ValueCountFrequency (%)
, 11660
91.6%
/ 1015
 
8.0%
? 50
 
0.4%
Space Separator
ValueCountFrequency (%)
13073
100.0%
Control
ValueCountFrequency (%)
š 26
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 325870
83.6%
Common 25824
 
6.6%
Han 10476
 
2.7%
Cyrillic 10454
 
2.7%
Devanagari 4236
 
1.1%
Arabic 3339
 
0.9%
Hangul 3252
 
0.8%
Hebrew 1720
 
0.4%
Greek 1704
 
0.4%
Thai 1225
 
0.3%
Other values (5) 1537
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 42241
13.0%
n 37438
11.5%
i 37085
11.4%
l 34612
10.6%
h 31440
9.6%
E 31180
9.6%
g 30395
9.3%
a 18936
 
5.8%
o 7049
 
2.2%
r 6124
 
1.9%
Other values (50) 49370
15.2%
Cyrillic
ValueCountFrequency (%)
с 3211
30.7%
к 1734
16.6%
и 1679
16.1%
й 1615
15.4%
у 1564
15.0%
а 113
 
1.1%
р 87
 
0.8%
У 53
 
0.5%
ї 53
 
0.5%
н 53
 
0.5%
Other values (12) 292
 
2.8%
Arabic
ValueCountFrequency (%)
ا 536
16.1%
ر 536
16.1%
ل 341
10.2%
ع 341
10.2%
ب 341
10.2%
ي 341
10.2%
ة 341
10.2%
ی 140
 
4.2%
ف 140
 
4.2%
س 140
 
4.2%
Other values (5) 142
 
4.3%
Han
ValueCountFrequency (%)
1756
16.8%
1756
16.8%
1756
16.8%
1263
12.1%
946
9.0%
790
7.5%
790
7.5%
广 473
 
4.5%
473
 
4.5%
473
 
4.5%
Hebrew
ValueCountFrequency (%)
ִ 430
25.0%
ת 215
12.5%
י 215
12.5%
ר 215
12.5%
ְ 215
12.5%
ב 215
12.5%
ע 215
12.5%
Greek
ValueCountFrequency (%)
λ 426
25.0%
ά 213
12.5%
κ 213
12.5%
ι 213
12.5%
ν 213
12.5%
η 213
12.5%
ε 213
12.5%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Devanagari
ValueCountFrequency (%)
706
16.7%
706
16.7%
706
16.7%
706
16.7%
706
16.7%
ि 706
16.7%
Hangul
ValueCountFrequency (%)
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
Thai
ValueCountFrequency (%)
350
28.6%
175
14.3%
175
14.3%
175
14.3%
175
14.3%
175
14.3%
Gurmukhi
ValueCountFrequency (%)
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
Common
ValueCountFrequency (%)
13073
50.6%
, 11660
45.2%
/ 1015
 
3.9%
? 50
 
0.2%
š 26
 
0.1%
Telugu
ValueCountFrequency (%)
136
33.3%
68
16.7%
68
16.7%
68
16.7%
68
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
ி 111
20.0%
111
20.0%
111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
94
40.0%
47
20.0%
47
20.0%
47
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 342843
88.0%
CJK 10476
 
2.7%
Cyrillic 10454
 
2.7%
None 10429
 
2.7%
Devanagari 4236
 
1.1%
Arabic 3339
 
0.9%
Hangul 3252
 
0.8%
Hebrew 1720
 
0.4%
Thai 1225
 
0.3%
Tamil 555
 
0.1%
Other values (6) 1108
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 42241
12.3%
n 37438
10.9%
i 37085
10.8%
l 34612
10.1%
h 31440
9.2%
E 31180
9.1%
g 30395
8.9%
a 18936
 
5.5%
13073
 
3.8%
, 11660
 
3.4%
Other values (38) 54783
16.0%
None
ValueCountFrequency (%)
ç 4438
42.6%
ñ 2412
23.1%
ê 591
 
5.7%
λ 426
 
4.1%
ý 283
 
2.7%
Č 283
 
2.7%
ü 247
 
2.4%
ά 213
 
2.0%
κ 213
 
2.0%
ι 213
 
2.0%
Other values (11) 1110
 
10.6%
Cyrillic
ValueCountFrequency (%)
с 3211
30.7%
к 1734
16.6%
и 1679
16.1%
й 1615
15.4%
у 1564
15.0%
а 113
 
1.1%
р 87
 
0.8%
У 53
 
0.5%
ї 53
 
0.5%
н 53
 
0.5%
Other values (12) 292
 
2.8%
CJK
ValueCountFrequency (%)
1756
16.8%
1756
16.8%
1756
16.8%
1263
12.1%
946
9.0%
790
7.5%
790
7.5%
广 473
 
4.5%
473
 
4.5%
473
 
4.5%
Devanagari
ValueCountFrequency (%)
706
16.7%
706
16.7%
706
16.7%
706
16.7%
706
16.7%
ि 706
16.7%
Hangul
ValueCountFrequency (%)
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
Arabic
ValueCountFrequency (%)
ا 536
16.1%
ر 536
16.1%
ل 341
10.2%
ع 341
10.2%
ب 341
10.2%
ي 341
10.2%
ة 341
10.2%
ی 140
 
4.2%
ف 140
 
4.2%
س 140
 
4.2%
Other values (5) 142
 
4.3%
Hebrew
ValueCountFrequency (%)
ִ 430
25.0%
ת 215
12.5%
י 215
12.5%
ר 215
12.5%
ְ 215
12.5%
ב 215
12.5%
ע 215
12.5%
Thai
ValueCountFrequency (%)
350
28.6%
175
14.3%
175
14.3%
175
14.3%
175
14.3%
175
14.3%
Telugu
ValueCountFrequency (%)
136
33.3%
68
16.7%
68
16.7%
68
16.7%
68
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
ி 111
20.0%
111
20.0%
111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
94
40.0%
47
20.0%
47
20.0%
47
20.0%
Latin Ext Additional
ValueCountFrequency (%)
ế 61
50.0%
61
50.0%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Gurmukhi
ValueCountFrequency (%)
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
IPA Ext
ValueCountFrequency (%)
ə 4
100.0%

status
Categorical

Distinct6
Distinct (%)< 0.1%
Missing80
Missing (%)0.2%
Memory size354.4 KiB
Released
44906 
Rumored
 
229
Post Production
 
97
In Production
 
19
Planned
 
13

Length

Max length15
Median length8
Mean length8.011753
Min length7

Characters and Unicode

Total characters362652
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowReleased
2nd rowReleased
3rd rowReleased
4th rowReleased
5th rowReleased

Common Values

ValueCountFrequency (%)
Released 44906
99.0%
Rumored 229
 
0.5%
Post Production 97
 
0.2%
In Production 19
 
< 0.1%
Planned 13
 
< 0.1%
Canceled 1
 
< 0.1%
(Missing) 80
 
0.2%

Length

2023-06-13T09:37:57.991464image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T09:37:58.258009image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
released 44906
99.0%
rumored 229
 
0.5%
production 116
 
0.3%
post 97
 
0.2%
in 19
 
< 0.1%
planned 13
 
< 0.1%
canceled 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 134962
37.2%
d 45265
 
12.5%
R 45135
 
12.4%
s 45003
 
12.4%
l 44920
 
12.4%
a 44920
 
12.4%
o 558
 
0.2%
r 345
 
0.1%
u 345
 
0.1%
m 229
 
0.1%
Other values (8) 970
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 317155
87.5%
Uppercase Letter 45381
 
12.5%
Space Separator 116
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 134962
42.6%
d 45265
 
14.3%
s 45003
 
14.2%
l 44920
 
14.2%
a 44920
 
14.2%
o 558
 
0.2%
r 345
 
0.1%
u 345
 
0.1%
m 229
 
0.1%
t 213
 
0.1%
Other values (3) 395
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
R 45135
99.5%
P 226
 
0.5%
I 19
 
< 0.1%
C 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
116
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 362536
> 99.9%
Common 116
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 134962
37.2%
d 45265
 
12.5%
R 45135
 
12.4%
s 45003
 
12.4%
l 44920
 
12.4%
a 44920
 
12.4%
o 558
 
0.2%
r 345
 
0.1%
u 345
 
0.1%
m 229
 
0.1%
Other values (7) 854
 
0.2%
Common
ValueCountFrequency (%)
116
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 362652
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 134962
37.2%
d 45265
 
12.5%
R 45135
 
12.4%
s 45003
 
12.4%
l 44920
 
12.4%
a 44920
 
12.4%
o 558
 
0.2%
r 345
 
0.1%
u 345
 
0.1%
m 229
 
0.1%
Other values (8) 970
 
0.3%

tagline
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct20269
Distinct (%)99.4%
Missing24958
Missing (%)55.0%
Memory size354.4 KiB
Based on a true story.
 
7
Trust no one.
 
4
Be careful what you wish for.
 
4
-
 
4
Documentary
 
3
Other values (20264)
20365 

Length

Max length297
Median length204
Mean length46.996517
Min length1

Characters and Unicode

Total characters958118
Distinct characters170
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20173 ?
Unique (%)99.0%

Sample

1st rowRoll the dice and unleash the excitement!
2nd rowStill Yelling. Still Fighting. Still Ready for Love.
3rd rowFriends are the people who let you be yourself... and never let you forget it.
4th rowJust When His World Is Back To Normal... He's In For The Surprise Of His Life!
5th rowA Los Angeles Crime Saga

Common Values

ValueCountFrequency (%)
Based on a true story. 7
 
< 0.1%
Trust no one. 4
 
< 0.1%
Be careful what you wish for. 4
 
< 0.1%
- 4
 
< 0.1%
Documentary 3
 
< 0.1%
How far would you go? 3
 
< 0.1%
A Love Story 3
 
< 0.1%
Who is John Galt? 3
 
< 0.1%
Some doors should never be opened. 3
 
< 0.1%
Classic Albums 3
 
< 0.1%
Other values (20259) 20350
44.9%
(Missing) 24958
55.0%

Length

2023-06-13T09:37:58.373152image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 10987
 
6.3%
a 6810
 
3.9%
of 4401
 
2.5%
to 3581
 
2.1%
is 2793
 
1.6%
in 2691
 
1.5%
and 2681
 
1.5%
you 2388
 
1.4%
1580
 
0.9%
for 1523
 
0.9%
Other values (15100) 134394
77.3%

Most occurring characters

ValueCountFrequency (%)
153590
16.0%
e 94342
 
9.8%
t 57223
 
6.0%
o 56534
 
5.9%
a 51450
 
5.4%
n 47460
 
5.0%
i 46013
 
4.8%
r 44957
 
4.7%
s 42345
 
4.4%
h 37144
 
3.9%
Other values (160) 327060
34.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 680045
71.0%
Space Separator 153590
 
16.0%
Uppercase Letter 74965
 
7.8%
Other Punctuation 44569
 
4.7%
Decimal Number 2687
 
0.3%
Dash Punctuation 1942
 
0.2%
Final Punctuation 98
 
< 0.1%
Open Punctuation 56
 
< 0.1%
Close Punctuation 55
 
< 0.1%
Currency Symbol 37
 
< 0.1%
Other values (7) 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 94342
13.9%
t 57223
 
8.4%
o 56534
 
8.3%
a 51450
 
7.6%
n 47460
 
7.0%
i 46013
 
6.8%
r 44957
 
6.6%
s 42345
 
6.2%
h 37144
 
5.5%
l 30159
 
4.4%
Other values (43) 172418
25.4%
Other Letter
ValueCountFrequency (%)
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
Other values (24) 24
70.6%
Uppercase Letter
ValueCountFrequency (%)
T 10007
 
13.3%
A 6871
 
9.2%
S 5648
 
7.5%
H 4401
 
5.9%
I 4387
 
5.9%
E 4304
 
5.7%
W 3678
 
4.9%
O 3476
 
4.6%
L 3193
 
4.3%
N 3193
 
4.3%
Other values (20) 25807
34.4%
Other Punctuation
ValueCountFrequency (%)
. 26640
59.8%
! 5784
 
13.0%
' 5672
 
12.7%
, 4222
 
9.5%
? 1159
 
2.6%
" 582
 
1.3%
148
 
0.3%
: 137
 
0.3%
& 83
 
0.2%
* 42
 
0.1%
Other values (7) 100
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 802
29.8%
1 516
19.2%
2 299
 
11.1%
3 208
 
7.7%
9 208
 
7.7%
5 168
 
6.3%
4 140
 
5.2%
6 121
 
4.5%
7 121
 
4.5%
8 104
 
3.9%
Math Symbol
ValueCountFrequency (%)
+ 5
35.7%
= 5
35.7%
| 2
 
14.3%
~ 1
 
7.1%
1
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 1925
99.1%
9
 
0.5%
8
 
0.4%
Final Punctuation
ValueCountFrequency (%)
82
83.7%
15
 
15.3%
» 1
 
1.0%
Initial Punctuation
ValueCountFrequency (%)
14
73.7%
4
 
21.1%
« 1
 
5.3%
Open Punctuation
ValueCountFrequency (%)
( 49
87.5%
[ 7
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 48
87.3%
] 7
 
12.7%
Other Number
ValueCountFrequency (%)
½ 2
66.7%
² 1
33.3%
Modifier Letter
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Space Separator
ValueCountFrequency (%)
153590
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 37
100.0%
Nonspacing Mark
ValueCountFrequency (%)
1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 755010
78.8%
Common 203073
 
21.2%
Han 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 94342
 
12.5%
t 57223
 
7.6%
o 56534
 
7.5%
a 51450
 
6.8%
n 47460
 
6.3%
i 46013
 
6.1%
r 44957
 
6.0%
s 42345
 
5.6%
h 37144
 
4.9%
l 30159
 
4.0%
Other values (73) 247383
32.8%
Common
ValueCountFrequency (%)
153590
75.6%
. 26640
 
13.1%
! 5784
 
2.8%
' 5672
 
2.8%
, 4222
 
2.1%
- 1925
 
0.9%
? 1159
 
0.6%
0 802
 
0.4%
" 582
 
0.3%
1 516
 
0.3%
Other values (42) 2181
 
1.1%
Han
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 957689
> 99.9%
Punctuation 280
 
< 0.1%
None 109
 
< 0.1%
CJK 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%
IPA Ext 2
 
< 0.1%
Modifier Letters 2
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
153590
16.0%
e 94342
 
9.9%
t 57223
 
6.0%
o 56534
 
5.9%
a 51450
 
5.4%
n 47460
 
5.0%
i 46013
 
4.8%
r 44957
 
4.7%
s 42345
 
4.4%
h 37144
 
3.9%
Other values (78) 326631
34.1%
Punctuation
ValueCountFrequency (%)
148
52.9%
82
29.3%
15
 
5.4%
14
 
5.0%
9
 
3.2%
8
 
2.9%
4
 
1.4%
None
ValueCountFrequency (%)
é 17
15.6%
ä 16
14.7%
ö 8
 
7.3%
á 6
 
5.5%
ó 6
 
5.5%
ü 5
 
4.6%
í 5
 
4.6%
ı 5
 
4.6%
· 4
 
3.7%
ć 3
 
2.8%
Other values (26) 34
31.2%
IPA Ext
ValueCountFrequency (%)
ə 2
100.0%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
CJK
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Modifier Letters
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Math Operators
ValueCountFrequency (%)
1
100.0%

title
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct42195
Distinct (%)93.1%
Missing0
Missing (%)0.0%
Memory size354.4 KiB
Cinderella
 
11
Hamlet
 
9
Alice in Wonderland
 
9
Les Misérables
 
8
Beauty and the Beast
 
8
Other values (42190)
45300 

Length

Max length105
Median length79
Mean length16.702393
Min length1

Characters and Unicode

Total characters757370
Distinct characters287
Distinct categories17 ?
Distinct scripts7 ?
Distinct blocks12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39891 ?
Unique (%)88.0%

Sample

1st rowToy Story
2nd rowJumanji
3rd rowGrumpier Old Men
4th rowWaiting to Exhale
5th rowFather of the Bride Part II

Common Values

ValueCountFrequency (%)
Cinderella 11
 
< 0.1%
Hamlet 9
 
< 0.1%
Alice in Wonderland 9
 
< 0.1%
Les Misérables 8
 
< 0.1%
Beauty and the Beast 8
 
< 0.1%
Treasure Island 7
 
< 0.1%
A Christmas Carol 7
 
< 0.1%
The Three Musketeers 7
 
< 0.1%
The Hunters 6
 
< 0.1%
The Stranger 6
 
< 0.1%
Other values (42185) 45267
99.8%

Length

2023-06-13T09:37:58.505182image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 14544
 
10.7%
of 4923
 
3.6%
a 2238
 
1.6%
in 1693
 
1.2%
and 1629
 
1.2%
to 1053
 
0.8%
756
 
0.6%
man 665
 
0.5%
love 664
 
0.5%
for 601
 
0.4%
Other values (24353) 107327
78.9%

Most occurring characters

ValueCountFrequency (%)
90770
 
12.0%
e 76195
 
10.1%
a 48911
 
6.5%
o 45633
 
6.0%
n 40797
 
5.4%
r 39993
 
5.3%
i 39748
 
5.2%
t 36705
 
4.8%
s 29499
 
3.9%
h 28498
 
3.8%
Other values (277) 280621
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 533780
70.5%
Uppercase Letter 117197
 
15.5%
Space Separator 90770
 
12.0%
Other Punctuation 10484
 
1.4%
Decimal Number 3845
 
0.5%
Dash Punctuation 980
 
0.1%
Close Punctuation 87
 
< 0.1%
Open Punctuation 85
 
< 0.1%
Final Punctuation 38
 
< 0.1%
Other Letter 25
 
< 0.1%
Other values (7) 79
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 76195
14.3%
a 48911
9.2%
o 45633
 
8.5%
n 40797
 
7.6%
r 39993
 
7.5%
i 39748
 
7.4%
t 36705
 
6.9%
s 29499
 
5.5%
h 28498
 
5.3%
l 25903
 
4.9%
Other values (121) 121898
22.8%
Uppercase Letter
ValueCountFrequency (%)
T 16010
13.7%
S 10331
 
8.8%
M 8029
 
6.9%
B 7653
 
6.5%
C 7157
 
6.1%
A 6782
 
5.8%
D 6330
 
5.4%
L 5869
 
5.0%
H 5170
 
4.4%
W 5162
 
4.4%
Other values (65) 38704
33.0%
Other Letter
ValueCountFrequency (%)
چ 2
 
8.0%
ه 2
 
8.0%
ک 2
 
8.0%
ی 2
 
8.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
ª 1
 
4.0%
Other values (11) 11
44.0%
Other Punctuation
ValueCountFrequency (%)
: 3714
35.4%
' 2504
23.9%
. 1603
15.3%
, 1133
 
10.8%
! 647
 
6.2%
& 458
 
4.4%
? 269
 
2.6%
/ 79
 
0.8%
* 19
 
0.2%
# 13
 
0.1%
Other values (8) 45
 
0.4%
Decimal Number
ValueCountFrequency (%)
2 861
22.4%
1 695
18.1%
0 616
16.0%
3 482
12.5%
9 229
 
6.0%
4 228
 
5.9%
5 224
 
5.8%
7 193
 
5.0%
8 161
 
4.2%
6 156
 
4.1%
Math Symbol
ValueCountFrequency (%)
+ 17
70.8%
× 3
 
12.5%
1
 
4.2%
= 1
 
4.2%
1
 
4.2%
1
 
4.2%
Other Number
ValueCountFrequency (%)
½ 12
63.2%
² 3
 
15.8%
³ 2
 
10.5%
1
 
5.3%
1
 
5.3%
Other Symbol
ValueCountFrequency (%)
° 3
37.5%
2
25.0%
1
 
12.5%
1
 
12.5%
1
 
12.5%
Currency Symbol
ValueCountFrequency (%)
$ 18
85.7%
¢ 2
 
9.5%
£ 1
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
- 965
98.5%
15
 
1.5%
Close Punctuation
ValueCountFrequency (%)
) 82
94.3%
] 5
 
5.7%
Open Punctuation
ValueCountFrequency (%)
( 80
94.1%
[ 5
 
5.9%
Final Punctuation
ValueCountFrequency (%)
37
97.4%
1
 
2.6%
Initial Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
90770
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Format
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 650462
85.9%
Common 106368
 
14.0%
Cyrillic 346
 
< 0.1%
Greek 170
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
Han 5
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 76195
 
11.7%
a 48911
 
7.5%
o 45633
 
7.0%
n 40797
 
6.3%
r 39993
 
6.1%
i 39748
 
6.1%
t 36705
 
5.6%
s 29499
 
4.5%
h 28498
 
4.4%
l 25903
 
4.0%
Other values (107) 238580
36.7%
Common
ValueCountFrequency (%)
90770
85.3%
: 3714
 
3.5%
' 2504
 
2.4%
. 1603
 
1.5%
, 1133
 
1.1%
- 965
 
0.9%
2 861
 
0.8%
1 695
 
0.7%
! 647
 
0.6%
0 616
 
0.6%
Other values (50) 2860
 
2.7%
Cyrillic
ValueCountFrequency (%)
о 32
 
9.2%
е 32
 
9.2%
а 29
 
8.4%
н 24
 
6.9%
и 23
 
6.6%
р 22
 
6.4%
к 17
 
4.9%
с 15
 
4.3%
в 14
 
4.0%
т 14
 
4.0%
Other values (38) 124
35.8%
Greek
ValueCountFrequency (%)
α 20
 
11.8%
ο 14
 
8.2%
ι 14
 
8.2%
τ 9
 
5.3%
λ 8
 
4.7%
ά 8
 
4.7%
ρ 8
 
4.7%
ν 7
 
4.1%
ε 6
 
3.5%
ς 6
 
3.5%
Other values (32) 70
41.2%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
چ 2
18.2%
ه 2
18.2%
ک 2
18.2%
ی 2
18.2%
س 1
9.1%
ا 1
9.1%
ج 1
9.1%
Han
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 755808
99.8%
None 1121
 
0.1%
Cyrillic 346
 
< 0.1%
Punctuation 62
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
CJK 5
 
< 0.1%
Misc Symbols 3
 
< 0.1%
Letterlike Symbols 2
 
< 0.1%
Math Operators 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
90770
 
12.0%
e 76195
 
10.1%
a 48911
 
6.5%
o 45633
 
6.0%
n 40797
 
5.4%
r 39993
 
5.3%
i 39748
 
5.3%
t 36705
 
4.9%
s 29499
 
3.9%
h 28498
 
3.8%
Other values (76) 279059
36.9%
None
ValueCountFrequency (%)
é 216
19.3%
ä 127
 
11.3%
ö 55
 
4.9%
è 53
 
4.7%
ô 44
 
3.9%
ü 39
 
3.5%
ó 37
 
3.3%
á 35
 
3.1%
ı 35
 
3.1%
í 33
 
2.9%
Other values (108) 447
39.9%
Punctuation
ValueCountFrequency (%)
37
59.7%
15
24.2%
5
 
8.1%
2
 
3.2%
1
 
1.6%
1
 
1.6%
1
 
1.6%
Cyrillic
ValueCountFrequency (%)
о 32
 
9.2%
е 32
 
9.2%
а 29
 
8.4%
н 24
 
6.9%
и 23
 
6.6%
р 22
 
6.4%
к 17
 
4.9%
с 15
 
4.3%
в 14
 
4.0%
т 14
 
4.0%
Other values (38) 124
35.8%
Arabic
ValueCountFrequency (%)
چ 2
18.2%
ه 2
18.2%
ک 2
18.2%
ی 2
18.2%
س 1
9.1%
ا 1
9.1%
ج 1
9.1%
Misc Symbols
ValueCountFrequency (%)
2
66.7%
1
33.3%
CJK
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Letterlike Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Math Operators
ValueCountFrequency (%)
1
50.0%
1
50.0%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arrows
ValueCountFrequency (%)
1
100.0%

vote_average
Real number (ℝ)

Distinct92
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.6243202
Minimum0
Maximum10
Zeros2943
Zeros (%)6.5%
Negative0
Negative (%)0.0%
Memory size354.4 KiB
2023-06-13T09:37:58.623209image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median6
Q36.8
95-th percentile7.8
Maximum10
Range10
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.915178
Coefficient of variation (CV)0.34051724
Kurtosis2.542779
Mean5.6243202
Median Absolute Deviation (MAD)0.9
Skewness-1.5243712
Sum255034.8
Variance3.6679067
MonotonicityNot monotonic
2023-06-13T09:37:58.731413image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2943
 
6.5%
6 2461
 
5.4%
5 1994
 
4.4%
7 1882
 
4.2%
6.5 1722
 
3.8%
6.3 1602
 
3.5%
5.5 1381
 
3.0%
5.8 1369
 
3.0%
6.4 1348
 
3.0%
6.7 1339
 
3.0%
Other values (82) 27304
60.2%
ValueCountFrequency (%)
0 2943
6.5%
0.5 13
 
< 0.1%
0.7 1
 
< 0.1%
1 103
 
0.2%
1.1 1
 
< 0.1%
1.2 4
 
< 0.1%
1.3 13
 
< 0.1%
1.4 5
 
< 0.1%
1.5 30
 
0.1%
1.6 6
 
< 0.1%
ValueCountFrequency (%)
10 185
0.4%
9.8 1
 
< 0.1%
9.6 1
 
< 0.1%
9.5 18
 
< 0.1%
9.4 3
 
< 0.1%
9.3 18
 
< 0.1%
9.2 4
 
< 0.1%
9.1 2
 
< 0.1%
9 158
0.3%
8.9 7
 
< 0.1%

vote_count
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1820
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean110.13772
Minimum0
Maximum14075
Zeros2845
Zeros (%)6.3%
Negative0
Negative (%)0.0%
Memory size354.4 KiB
2023-06-13T09:37:58.838437image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median10
Q334
95-th percentile434.8
Maximum14075
Range14075
Interquartile range (IQR)31

Descriptive statistics

Standard deviation491.90443
Coefficient of variation (CV)4.4662666
Kurtosis150.82809
Mean110.13772
Median Absolute Deviation (MAD)8
Skewness10.437382
Sum4994195
Variance241969.97
MonotonicityNot monotonic
2023-06-13T09:37:58.943461image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3240
 
7.1%
2 3127
 
6.9%
0 2845
 
6.3%
3 2780
 
6.1%
4 2477
 
5.5%
5 2096
 
4.6%
6 1747
 
3.9%
7 1568
 
3.5%
8 1359
 
3.0%
9 1194
 
2.6%
Other values (1810) 22912
50.5%
ValueCountFrequency (%)
0 2845
6.3%
1 3240
7.1%
2 3127
6.9%
3 2780
6.1%
4 2477
5.5%
5 2096
4.6%
6 1747
3.9%
7 1568
3.5%
8 1359
3.0%
9 1194
 
2.6%
ValueCountFrequency (%)
14075 1
< 0.1%
12269 1
< 0.1%
12114 1
< 0.1%
12000 1
< 0.1%
11444 1
< 0.1%
11187 1
< 0.1%
10297 1
< 0.1%
10014 1
< 0.1%
9678 1
< 0.1%
9634 1
< 0.1%

Director
Categorical

HIGH CARDINALITY  MISSING 

Distinct18828
Distinct (%)42.3%
Missing835
Missing (%)1.8%
Memory size354.4 KiB
John Ford
 
63
Michael Curtiz
 
61
Alfred Hitchcock
 
52
Werner Herzog
 
52
Georges Méliès
 
51
Other values (18823)
44231 

Length

Max length654
Median length468
Mean length15.016873
Min length2

Characters and Unicode

Total characters668401
Distinct characters204
Distinct categories11 ?
Distinct scripts6 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12107 ?
Unique (%)27.2%

Sample

1st rowJohn Lasseter
2nd rowJoe Johnston
3rd rowHoward Deutch
4th rowForest Whitaker
5th rowCharles Shyer

Common Values

ValueCountFrequency (%)
John Ford 63
 
0.1%
Michael Curtiz 61
 
0.1%
Alfred Hitchcock 52
 
0.1%
Werner Herzog 52
 
0.1%
Georges Méliès 51
 
0.1%
Woody Allen 47
 
0.1%
Sidney Lumet 45
 
0.1%
Charlie Chaplin 43
 
0.1%
Henry Hathaway 41
 
0.1%
William A. Wellman 41
 
0.1%
Other values (18818) 44014
97.1%
(Missing) 835
 
1.8%

Length

2023-06-13T09:37:59.072490image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john 1228
 
1.2%
michael 942
 
0.9%
david 898
 
0.9%
robert 856
 
0.8%
peter 579
 
0.6%
william 557
 
0.5%
richard 539
 
0.5%
james 524
 
0.5%
paul 471
 
0.5%
george 426
 
0.4%
Other values (18705) 95879
93.2%

Most occurring characters

ValueCountFrequency (%)
58484
 
8.7%
e 57401
 
8.6%
a 56952
 
8.5%
r 44714
 
6.7%
n 44225
 
6.6%
i 43004
 
6.4%
o 38754
 
5.8%
l 30169
 
4.5%
s 22885
 
3.4%
t 21741
 
3.3%
Other values (194) 250072
37.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 495924
74.2%
Uppercase Letter 104741
 
15.7%
Space Separator 58484
 
8.7%
Other Punctuation 7817
 
1.2%
Dash Punctuation 1389
 
0.2%
Other Letter 23
 
< 0.1%
Control 12
 
< 0.1%
Decimal Number 6
 
< 0.1%
Close Punctuation 2
 
< 0.1%
Open Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 57401
11.6%
a 56952
11.5%
r 44714
 
9.0%
n 44225
 
8.9%
i 43004
 
8.7%
o 38754
 
7.8%
l 30169
 
6.1%
s 22885
 
4.6%
t 21741
 
4.4%
h 18310
 
3.7%
Other values (97) 117769
23.7%
Uppercase Letter
ValueCountFrequency (%)
M 9236
 
8.8%
S 8760
 
8.4%
J 7848
 
7.5%
R 6685
 
6.4%
B 6516
 
6.2%
C 6509
 
6.2%
A 6282
 
6.0%
D 5567
 
5.3%
L 5420
 
5.2%
G 5006
 
4.8%
Other values (53) 36912
35.2%
Other Letter
ValueCountFrequency (%)
م 2
 
8.7%
ی 2
 
8.7%
ا 2
 
8.7%
ع 1
 
4.3%
1
 
4.3%
ن 1
 
4.3%
1
 
4.3%
1
 
4.3%
1
 
4.3%
پ 1
 
4.3%
Other values (10) 10
43.5%
Other Punctuation
ValueCountFrequency (%)
, 4479
57.3%
. 3112
39.8%
' 225
 
2.9%
· 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 3
50.0%
5 1
 
16.7%
9 1
 
16.7%
3 1
 
16.7%
Space Separator
ValueCountFrequency (%)
58484
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1389
100.0%
Control
ValueCountFrequency (%)
12
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Math Symbol
ValueCountFrequency (%)
| 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 600477
89.8%
Common 67713
 
10.1%
Cyrillic 188
 
< 0.1%
Arabic 10
 
< 0.1%
Han 10
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 57401
 
9.6%
a 56952
 
9.5%
r 44714
 
7.4%
n 44225
 
7.4%
i 43004
 
7.2%
o 38754
 
6.5%
l 30169
 
5.0%
s 22885
 
3.8%
t 21741
 
3.6%
h 18310
 
3.0%
Other values (123) 222322
37.0%
Cyrillic
ValueCountFrequency (%)
и 22
 
11.7%
о 15
 
8.0%
а 14
 
7.4%
е 14
 
7.4%
к 13
 
6.9%
л 13
 
6.9%
р 13
 
6.9%
н 11
 
5.9%
д 9
 
4.8%
в 6
 
3.2%
Other values (27) 58
30.9%
Common
ValueCountFrequency (%)
58484
86.4%
, 4479
 
6.6%
. 3112
 
4.6%
- 1389
 
2.1%
' 225
 
0.3%
12
 
< 0.1%
0 3
 
< 0.1%
) 2
 
< 0.1%
( 2
 
< 0.1%
5 1
 
< 0.1%
Other values (4) 4
 
< 0.1%
Han
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Arabic
ValueCountFrequency (%)
م 2
20.0%
ی 2
20.0%
ا 2
20.0%
ع 1
10.0%
ن 1
10.0%
پ 1
10.0%
د 1
10.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 663990
99.3%
None 4197
 
0.6%
Cyrillic 188
 
< 0.1%
Arabic 10
 
< 0.1%
CJK 10
 
< 0.1%
Hangul 3
 
< 0.1%
Latin Ext Additional 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
58484
 
8.8%
e 57401
 
8.6%
a 56952
 
8.6%
r 44714
 
6.7%
n 44225
 
6.7%
i 43004
 
6.5%
o 38754
 
5.8%
l 30169
 
4.5%
s 22885
 
3.4%
t 21741
 
3.3%
Other values (55) 245661
37.0%
None
ValueCountFrequency (%)
é 998
23.8%
á 410
 
9.8%
ö 274
 
6.5%
í 255
 
6.1%
ó 243
 
5.8%
ô 163
 
3.9%
ä 154
 
3.7%
è 147
 
3.5%
ü 117
 
2.8%
ç 112
 
2.7%
Other values (69) 1324
31.5%
Cyrillic
ValueCountFrequency (%)
и 22
 
11.7%
о 15
 
8.0%
а 14
 
7.4%
е 14
 
7.4%
к 13
 
6.9%
л 13
 
6.9%
р 13
 
6.9%
н 11
 
5.9%
д 9
 
4.8%
в 6
 
3.2%
Other values (27) 58
30.9%
Arabic
ValueCountFrequency (%)
م 2
20.0%
ی 2
20.0%
ا 2
20.0%
ع 1
10.0%
ن 1
10.0%
پ 1
10.0%
د 1
10.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Latin Ext Additional
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
CJK
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%

actores
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct42656
Distinct (%)99.2%
Missing2348
Missing (%)5.2%
Memory size354.4 KiB
Georges Méliès
 
24
Louis Theroux
 
15
Mel Blanc
 
12
Jimmy Carr
 
9
Werner Herzog
 
8
Other values (42651)
42929 

Length

Max length4551
Median length1364
Mean length198.06745
Min length4

Characters and Unicode

Total characters8516306
Distinct characters395
Distinct categories16 ?
Distinct scripts9 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique42490 ?
Unique (%)98.8%

Sample

1st rowTom Hanks, Tim Allen, Don Rickles, Jim Varney, Wallace Shawn, John Ratzenberger, Annie Potts, John Morris, Erik von Detten, Laurie Metcalf, R. Lee Ermey, Sarah Freeman, Penn Jillette
2nd rowRobin Williams, Jonathan Hyde, Kirsten Dunst, Bradley Pierce, Bonnie Hunt, Bebe Neuwirth, David Alan Grier, Patricia Clarkson, Adam Hann-Byrd, Laura Bell Bundy, James Handy, Gillian Barber, Brandon Obray, Cyrus Thiedeke, Gary Joseph Thorup, Leonard Zola, Lloyd Berry, Malcolm Stewart, Annabel Kershaw, Darryl Henriques, Robyn Driscoll, Peter Bryant, Sarah Gilson, Florica Vlad, June Lion, Brenda Lockmuller
3rd rowWalter Matthau, Jack Lemmon, Ann-Margret, Sophia Loren, Daryl Hannah, Burgess Meredith, Kevin Pollak
4th rowWhitney Houston, Angela Bassett, Loretta Devine, Lela Rochon, Gregory Hines, Dennis Haysbert, Michael Beach, Mykelti Williamson, Lamont Johnson, Wesley Snipes
5th rowSteve Martin, Diane Keaton, Martin Short, Kimberly Williams-Paisley, George Newbern, Kieran Culkin, BD Wong, Peter Michael Goetz, Kate McGregor-Stewart, Jane Adams, Eugene Levy, Lori Alan

Common Values

ValueCountFrequency (%)
Georges Méliès 24
 
0.1%
Louis Theroux 15
 
< 0.1%
Mel Blanc 12
 
< 0.1%
Jimmy Carr 9
 
< 0.1%
Werner Herzog 8
 
< 0.1%
Louis C.K. 8
 
< 0.1%
David Attenborough 8
 
< 0.1%
George Carlin 8
 
< 0.1%
Trevor Noah 6
 
< 0.1%
Jim Jefferies 6
 
< 0.1%
Other values (42646) 42893
94.6%
(Missing) 2348
 
5.2%

Length

2023-06-13T09:37:59.204520image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john 9804
 
0.8%
michael 7451
 
0.6%
david 6181
 
0.5%
robert 5719
 
0.5%
james 5687
 
0.5%
richard 4443
 
0.4%
paul 4313
 
0.4%
peter 3901
 
0.3%
william 3431
 
0.3%
george 3412
 
0.3%
Other values (112933) 1110138
95.3%

Most occurring characters

ValueCountFrequency (%)
1121611
 
13.2%
a 704559
 
8.3%
e 664974
 
7.8%
n 523889
 
6.2%
, 519240
 
6.1%
r 497134
 
5.8%
i 483747
 
5.7%
o 423609
 
5.0%
l 366293
 
4.3%
s 255769
 
3.0%
Other values (385) 2955481
34.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5648356
66.3%
Uppercase Letter 1189882
 
14.0%
Space Separator 1121614
 
13.2%
Other Punctuation 541536
 
6.4%
Dash Punctuation 14094
 
0.2%
Other Letter 543
 
< 0.1%
Decimal Number 94
 
< 0.1%
Final Punctuation 83
 
< 0.1%
Initial Punctuation 23
 
< 0.1%
Open Punctuation 23
 
< 0.1%
Other values (6) 58
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 704559
12.5%
e 664974
11.8%
n 523889
9.3%
r 497134
 
8.8%
i 483747
 
8.6%
o 423609
 
7.5%
l 366293
 
6.5%
s 255769
 
4.5%
t 253088
 
4.5%
h 197801
 
3.5%
Other values (138) 1277493
22.6%
Other Letter
ValueCountFrequency (%)
ا 32
 
5.9%
م 31
 
5.7%
ی 19
 
3.5%
ع 19
 
3.5%
ن 18
 
3.3%
د 17
 
3.1%
ر 17
 
3.1%
17
 
3.1%
ي 16
 
2.9%
12
 
2.2%
Other values (104) 345
63.5%
Uppercase Letter
ValueCountFrequency (%)
M 109289
 
9.2%
S 92274
 
7.8%
C 83974
 
7.1%
J 83292
 
7.0%
B 82323
 
6.9%
A 70788
 
5.9%
R 67354
 
5.7%
D 65862
 
5.5%
L 61136
 
5.1%
G 54642
 
4.6%
Other values (81) 418948
35.2%
Decimal Number
ValueCountFrequency (%)
5 37
39.4%
0 29
30.9%
1 8
 
8.5%
2 8
 
8.5%
9 4
 
4.3%
3 2
 
2.1%
7 2
 
2.1%
4 2
 
2.1%
8 1
 
1.1%
6 1
 
1.1%
Other Punctuation
ValueCountFrequency (%)
, 519240
95.9%
. 16045
 
3.0%
' 6095
 
1.1%
" 129
 
< 0.1%
· 9
 
< 0.1%
& 6
 
< 0.1%
: 6
 
< 0.1%
! 5
 
< 0.1%
/ 1
 
< 0.1%
Nonspacing Mark
ValueCountFrequency (%)
́ 10
58.8%
2
 
11.8%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
Final Punctuation
ValueCountFrequency (%)
74
89.2%
6
 
7.2%
» 3
 
3.6%
Space Separator
ValueCountFrequency (%)
1121611
> 99.9%
  3
 
< 0.1%
Initial Punctuation
ValueCountFrequency (%)
20
87.0%
« 3
 
13.0%
Open Punctuation
ValueCountFrequency (%)
14
60.9%
( 9
39.1%
Format
ValueCountFrequency (%)
5
83.3%
1
 
16.7%
Dash Punctuation
ValueCountFrequency (%)
- 14094
100.0%
Control
ValueCountFrequency (%)
21
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 3
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6835154
80.3%
Common 1677507
 
19.7%
Cyrillic 3070
 
< 0.1%
Han 276
 
< 0.1%
Arabic 241
 
< 0.1%
Thai 27
 
< 0.1%
Greek 14
 
< 0.1%
Inherited 11
 
< 0.1%
Hangul 6
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 704559
 
10.3%
e 664974
 
9.7%
n 523889
 
7.7%
r 497134
 
7.3%
i 483747
 
7.1%
o 423609
 
6.2%
l 366293
 
5.4%
s 255769
 
3.7%
t 253088
 
3.7%
h 197801
 
2.9%
Other values (163) 2464291
36.1%
Han
ValueCountFrequency (%)
17
 
6.2%
12
 
4.3%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
9
 
3.3%
9
 
3.3%
Other values (55) 163
59.1%
Cyrillic
ValueCountFrequency (%)
а 323
 
10.5%
и 315
 
10.3%
о 233
 
7.6%
н 229
 
7.5%
р 215
 
7.0%
е 174
 
5.7%
л 155
 
5.0%
к 136
 
4.4%
т 115
 
3.7%
с 109
 
3.6%
Other values (51) 1066
34.7%
Common
ValueCountFrequency (%)
1121611
66.9%
, 519240
31.0%
. 16045
 
1.0%
- 14094
 
0.8%
' 6095
 
0.4%
" 129
 
< 0.1%
74
 
< 0.1%
5 37
 
< 0.1%
0 29
 
< 0.1%
21
 
< 0.1%
Other values (24) 132
 
< 0.1%
Arabic
ValueCountFrequency (%)
ا 32
13.3%
م 31
12.9%
ی 19
 
7.9%
ع 19
 
7.9%
ن 18
 
7.5%
د 17
 
7.1%
ر 17
 
7.1%
ي 16
 
6.6%
ل 9
 
3.7%
س 8
 
3.3%
Other values (18) 55
22.8%
Thai
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (11) 11
40.7%
Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
Greek
ValueCountFrequency (%)
ν 6
42.9%
Ζ 2
 
14.3%
α 2
 
14.3%
ί 2
 
14.3%
ο 2
 
14.3%
Inherited
ValueCountFrequency (%)
́ 10
90.9%
1
 
9.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8474252
99.5%
None 38248
 
0.4%
Cyrillic 3070
 
< 0.1%
CJK 276
 
< 0.1%
Arabic 241
 
< 0.1%
Punctuation 120
 
< 0.1%
Latin Ext Additional 56
 
< 0.1%
Thai 27
 
< 0.1%
Diacriticals 10
 
< 0.1%
Hangul 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1121611
 
13.2%
a 704559
 
8.3%
e 664974
 
7.8%
n 523889
 
6.2%
, 519240
 
6.1%
r 497134
 
5.9%
i 483747
 
5.7%
o 423609
 
5.0%
l 366293
 
4.3%
s 255769
 
3.0%
Other values (66) 2913427
34.4%
None
ValueCountFrequency (%)
é 9072
23.7%
á 4155
 
10.9%
í 2756
 
7.2%
ô 2330
 
6.1%
ö 2014
 
5.3%
ó 1881
 
4.9%
ü 1492
 
3.9%
ć 1360
 
3.6%
è 1243
 
3.2%
ä 994
 
2.6%
Other values (111) 10951
28.6%
Cyrillic
ValueCountFrequency (%)
а 323
 
10.5%
и 315
 
10.3%
о 233
 
7.6%
н 229
 
7.5%
р 215
 
7.0%
е 174
 
5.7%
л 155
 
5.0%
к 136
 
4.4%
т 115
 
3.7%
с 109
 
3.6%
Other values (51) 1066
34.7%
Punctuation
ValueCountFrequency (%)
74
61.7%
20
 
16.7%
14
 
11.7%
6
 
5.0%
5
 
4.2%
1
 
0.8%
Arabic
ValueCountFrequency (%)
ا 32
13.3%
م 31
12.9%
ی 19
 
7.9%
ع 19
 
7.9%
ن 18
 
7.5%
د 17
 
7.1%
ر 17
 
7.1%
ي 16
 
6.6%
ل 9
 
3.7%
س 8
 
3.3%
Other values (18) 55
22.8%
CJK
ValueCountFrequency (%)
17
 
6.2%
12
 
4.3%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
9
 
3.3%
9
 
3.3%
Other values (55) 163
59.1%
Latin Ext Additional
ValueCountFrequency (%)
15
26.8%
9
16.1%
6
 
10.7%
6
 
10.7%
ế 5
 
8.9%
4
 
7.1%
4
 
7.1%
4
 
7.1%
2
 
3.6%
1
 
1.8%
Diacriticals
ValueCountFrequency (%)
́ 10
100.0%
Thai
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (11) 11
40.7%
Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%

release_year
Real number (ℝ)

Distinct135
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1991.8823
Minimum1874
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size354.4 KiB
2023-06-13T09:37:59.318546image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1874
5-th percentile1941
Q11978
median2001
Q32010
95-th percentile2015
Maximum2020
Range146
Interquartile range (IQR)32

Descriptive statistics

Standard deviation24.053016
Coefficient of variation (CV)0.012075521
Kurtosis0.84033154
Mean1991.8823
Median Absolute Deviation (MAD)12
Skewness-1.2247734
Sum90321902
Variance578.54758
MonotonicityNot monotonic
2023-06-13T09:37:59.426571image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2014 1973
 
4.4%
2015 1904
 
4.2%
2013 1887
 
4.2%
2012 1721
 
3.8%
2011 1666
 
3.7%
2016 1604
 
3.5%
2009 1585
 
3.5%
2010 1501
 
3.3%
2008 1470
 
3.2%
2007 1319
 
2.9%
Other values (125) 28715
63.3%
ValueCountFrequency (%)
1874 1
 
< 0.1%
1878 1
 
< 0.1%
1883 1
 
< 0.1%
1887 1
 
< 0.1%
1888 2
 
< 0.1%
1890 5
 
< 0.1%
1891 6
< 0.1%
1892 3
 
< 0.1%
1893 1
 
< 0.1%
1894 13
< 0.1%
ValueCountFrequency (%)
2020 1
 
< 0.1%
2018 5
 
< 0.1%
2017 531
 
1.2%
2016 1604
3.5%
2015 1904
4.2%
2014 1973
4.4%
2013 1887
4.2%
2012 1721
3.8%
2011 1666
3.7%
2010 1501
3.3%

return
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct5232
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean660.49327
Minimum-9.0611124
Maximum12396383
Zeros39970
Zeros (%)88.1%
Negative1
Negative (%)< 0.1%
Memory size354.4 KiB
2023-06-13T09:37:59.542263image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-9.0611124
5-th percentile0
Q10
median0
Q30
95-th percentile2.5333333
Maximum12396383
Range12396392
Interquartile range (IQR)0

Descriptive statistics

Standard deviation74718.82
Coefficient of variation (CV)113.12579
Kurtosis20658.832
Mean660.49327
Median Absolute Deviation (MAD)0
Skewness138.28226
Sum29950067
Variance5.5829021 × 109
MonotonicityNot monotonic
2023-06-13T09:37:59.658292image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 39970
88.1%
1 20
 
< 0.1%
2 12
 
< 0.1%
4 11
 
< 0.1%
5 8
 
< 0.1%
3 7
 
< 0.1%
1.333333333 7
 
< 0.1%
2.5 7
 
< 0.1%
1.5 6
 
< 0.1%
4.666666667 4
 
< 0.1%
Other values (5222) 5293
 
11.7%
ValueCountFrequency (%)
-9.061112439 1
 
< 0.1%
0 39970
88.1%
5.217391304 × 10-71
 
< 0.1%
7.5 × 10-71
 
< 0.1%
9.375 × 10-71
 
< 0.1%
1.499133126 × 10-61
 
< 0.1%
1.8 × 10-61
 
< 0.1%
1.916666667 × 10-61
 
< 0.1%
3.5 × 10-61
 
< 0.1%
4 × 10-61
 
< 0.1%
ValueCountFrequency (%)
12396383 1
< 0.1%
8500000 1
< 0.1%
4197476.625 1
< 0.1%
2755584 1
< 0.1%
1018619.283 1
< 0.1%
1000000 1
< 0.1%
26881.72043 1
< 0.1%
12890.38667 1
< 0.1%
5330.33945 1
< 0.1%
4133.333333 1
< 0.1%

Interactions

2023-06-13T09:37:53.920841image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.042554image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.011082image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.814480image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.635843image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.433023image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.260892image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.219281image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.069474image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:54.019864image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.156532image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.105102image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.911500image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.731865image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.530251image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.356913image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.317303image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.171496image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:54.104883image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.338763image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.187121image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.994521image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.815884image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.615271image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.441932image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.406323image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.259692image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:54.194904image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.431781image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.274142image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.081540image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.901903image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.705292image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.531953image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.497344image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.351714image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:54.287925image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.525801image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.361161image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.169736image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.984922image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.792312image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.763006image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.589365image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.442733image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:54.388947image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.621824image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.449288image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.260758image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.071942image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.883635image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.853025image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.685387image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.536754image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:54.479969image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.713845image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.536417image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.347777image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.156961image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.971826image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.937217image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.778408image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.630775image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:54.576991image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.814867image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.629438image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.442800image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.248982image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.067847image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.031238image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.872429image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.727798image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:54.677014image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:47.913890image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:48.725460image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:49.535820image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:50.343005image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:51.163869image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.127261image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:52.972451image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-13T09:37:53.825820image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-06-13T09:37:59.759209image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
budgetidpopularityrevenueruntimevote_averagevote_countrelease_yearreturnoriginal_languagestatus
budget1.000-0.2550.4630.6440.2270.0720.4840.1410.7750.0000.000
id-0.2551.000-0.410-0.277-0.206-0.149-0.4330.392-0.2610.0710.056
popularity0.463-0.4101.0000.4910.3070.2410.8940.1860.4470.0000.000
revenue0.644-0.2770.4911.0000.2540.1260.5130.1030.8530.0000.000
runtime0.227-0.2060.3070.2541.0000.1930.2900.0340.2340.1110.000
vote_average0.072-0.1490.2410.1260.1931.0000.317-0.0090.1190.0700.019
vote_count0.484-0.4330.8940.5130.2900.3171.0000.1970.4740.0000.000
release_year0.1410.3920.1860.1030.034-0.0090.1971.0000.0870.1440.028
return0.775-0.2610.4470.8530.2340.1190.4740.0871.0000.0000.000
original_language0.0000.0710.0000.0000.1110.0700.0000.1440.0001.0000.000
status0.0000.0560.0000.0000.0000.0190.0000.0280.0000.0001.000

Missing values

2023-06-13T09:37:54.865550image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-06-13T09:37:55.188359image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-06-13T09:37:55.561461image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

budgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countDirectoractoresrelease_yearreturn
030000000Animation, Comedy, Family862.0enLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.21.946943Pixar Animation StudiosUS1995-10-3037355403381.0EnglishReleasedNaNToy Story7.75415.0John LasseterTom Hanks, Tim Allen, Don Rickles, Jim Varney, Wallace Shawn, John Ratzenberger, Annie Potts, John Morris, Erik von Detten, Laurie Metcalf, R. Lee Ermey, Sarah Freeman, Penn Jillette199512.451801
165000000Adventure, Fantasy, Family8844.0enWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.17.015539TriStar Pictures, Teitler Film, Interscope CommunicationsUS1995-12-15262797249104.0English, FrançaisReleasedRoll the dice and unleash the excitement!Jumanji6.92413.0Joe JohnstonRobin Williams, Jonathan Hyde, Kirsten Dunst, Bradley Pierce, Bonnie Hunt, Bebe Neuwirth, David Alan Grier, Patricia Clarkson, Adam Hann-Byrd, Laura Bell Bundy, James Handy, Gillian Barber, Brandon Obray, Cyrus Thiedeke, Gary Joseph Thorup, Leonard Zola, Lloyd Berry, Malcolm Stewart, Annabel Kershaw, Darryl Henriques, Robyn Driscoll, Peter Bryant, Sarah Gilson, Florica Vlad, June Lion, Brenda Lockmuller19954.043035
20Romance, Comedy15602.0enA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.11.712900Warner Bros., Lancaster GateUS1995-12-220101.0EnglishReleasedStill Yelling. Still Fighting. Still Ready for Love.Grumpier Old Men6.592.0Howard DeutchWalter Matthau, Jack Lemmon, Ann-Margret, Sophia Loren, Daryl Hannah, Burgess Meredith, Kevin Pollak19950.000000
316000000Comedy, Drama, Romance31357.0enCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.3.859495Twentieth Century Fox Film CorporationUS1995-12-2281452156127.0EnglishReleasedFriends are the people who let you be yourself... and never let you forget it.Waiting to Exhale6.134.0Forest WhitakerWhitney Houston, Angela Bassett, Loretta Devine, Lela Rochon, Gregory Hines, Dennis Haysbert, Michael Beach, Mykelti Williamson, Lamont Johnson, Wesley Snipes19955.090760
40Comedy11862.0enJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.8.387519Sandollar Productions, Touchstone PicturesUS1995-02-1076578911106.0EnglishReleasedJust When His World Is Back To Normal... He's In For The Surprise Of His Life!Father of the Bride Part II5.7173.0Charles ShyerSteve Martin, Diane Keaton, Martin Short, Kimberly Williams-Paisley, George Newbern, Kieran Culkin, BD Wong, Peter Michael Goetz, Kate McGregor-Stewart, Jane Adams, Eugene Levy, Lori Alan19950.000000
560000000Action, Crime, Drama, Thriller949.0enObsessive master thief, Neil McCauley leads a top-notch crew on various insane heists throughout Los Angeles while a mentally unstable detective, Vincent Hanna pursues him without rest. Each man recognizes and respects the ability and the dedication of the other even though they are aware their cat-and-mouse game may end in violence.17.924927Regency Enterprises, Forward Pass, Warner Bros.US1995-12-15187436818170.0English, EspañolReleasedA Los Angeles Crime SagaHeat7.71886.0Michael MannAl Pacino, Robert De Niro, Val Kilmer, Jon Voight, Tom Sizemore, Diane Venora, Amy Brenneman, Ashley Judd, Mykelti Williamson, Natalie Portman, Ted Levine, Tom Noonan, Tone Loc, Hank Azaria, Wes Studi, Dennis Haysbert, Danny Trejo, Henry Rollins, William Fichtner, Kevin Gage, Susan Traylor, Jerry Trimble, Ricky Harris, Jeremy Piven, Xander Berkeley, Begonya Plaza, Rick Avery, Hazelle Goodman, Ray Buktenica, Max Daniels, Vince Deadrick Jr., Steven Ford, Farrah Forke, Patricia Healy, Paul Herman, Cindy Katz, Brian Libby, Dan Martin, Mario Roberts, Thomas Rosales, Jr., Yvonne Zima, Mick Gould, Bud Cort, Viviane Vives, Kim Staunton, Martin Ferrero, Brad Baldridge, Andrew Camuccio, Kenny Endoso, Kimberly Flynn, Niki Harris, Bill McIntosh, Rick Marzan, Terry Miller, Daniel O'Haco, Kai Soremekun, Peter Blackwell, Trevor Coppola, Mary Kircher, Darin Mangan, Robert Miranda, Manny Perry, Iva Franks Singer, Tim Werner, Philip Ettington19953.123947
658000000Comedy, Romance11860.0enAn ugly duckling having undergone a remarkable change, still harbors feelings for her crush: a carefree playboy, but not before his business-focused brother has something to say about it.6.677277Paramount Pictures, Scott Rudin Productions, Mirage Enterprises, Sandollar Productions, Constellation Entertainment, Worldwide, Mont Blanc Entertainment GmbHDE, US1995-12-150127.0Français, EnglishReleasedYou are cordially invited to the most surprising merger of the year.Sabrina6.2141.0Sydney PollackHarrison Ford, Julia Ormond, Greg Kinnear, Angie Dickinson, Nancy Marchand, John Wood, Richard Crenna, Lauren Holly, Dana Ivey, Fanny Ardant, Patrick Bruel, Paul Giamatti, Miriam Colón, Elizabeth Franz, Valérie Lemercier, Becky Ann Baker, John C. Vennema, Margo Martindale, J. Smith-Cameron, Christine Luneau-Lipton, Michael Dees, Denis Holmes, Jo-Jo Lowe, Ira Wheeler, Philippa Cooper, Ayako Kawahara, François Genty, Guillaume Gallienne, Inés Sastre, Phina Oruche, Andrea Behalikova, Jennifer Herrera, Kristina Kumlin, Eva Linderholm, Carmen Chaplin, Micheline Van de Velde, Joanna Rhodes, Alan Boone, Patrick Forster-Delmas, Kentaro Matsuo, Peter McKernan, Ed Connelly, Ronald L. Schwary, Alvin Lum, Siching Song, Phil Nee, Randy Becker, Susan Browning, Anthony Mondal, Peter Parks, Woodrow Asai, Eric Bruno Borgman, Michael Cline, Christopher Del Gaudio, Philippe Hartmann, Jerry Quinn, Dori Rosenthal19950.000000
70Action, Adventure, Drama, Family45325.0enA mischievous young boy, Tom Sawyer, witnesses a murder by the deadly Injun Joe. Tom becomes friends with Huckleberry Finn, a boy with no future and no family. Tom has to choose between honoring a friendship or honoring an oath because the town alcoholic is accused of the murder. Tom and Huck go through several adventures trying to retrieve evidence.2.561161Walt Disney PicturesUS1995-12-22097.0English, DeutschReleasedThe Original Bad Boys.Tom and Huck5.445.0Peter HewittJonathan Taylor Thomas, Brad Renfro, Rachael Leigh Cook, Michael McShane, Amy Wright, Eric Schweig, Tamara Mello19950.000000
835000000Action, Adventure, Thriller9091.0enInternational action superstar Jean Claude Van Damme teams with Powers Boothe in a Tension-packed, suspense thriller, set against the back-drop of a Stanley Cup game.Van Damme portrays a father whose daughter is suddenly taken during a championship hockey game. With the captors demanding a billion dollars by game's end, Van Damme frantically sets a plan in motion to rescue his daughter and abort an impending explosion before the final buzzer...5.231580Universal Pictures, Imperial Entertainment, Signature EntertainmentUS1995-12-2264350171106.0EnglishReleasedTerror goes into overtime.Sudden Death5.5174.0Peter HyamsJean-Claude Van Damme, Powers Boothe, Dorian Harewood, Raymond J. Barry, Ross Malinger, Whittni Wright19951.838576
958000000Adventure, Action, Thriller710.0enJames Bond must unmask the mysterious head of the Janus Syndicate and prevent the leader from utilizing the GoldenEye weapons system to inflict devastating revenge on Britain.14.686036United Artists, Eon ProductionsGB, US1995-11-16352194034130.0English, Pусский, EspañolReleasedNo limits. No fears. No substitutes.GoldenEye6.61194.0Martin CampbellPierce Brosnan, Sean Bean, Izabella Scorupco, Famke Janssen, Joe Don Baker, Judi Dench, Gottfried John, Robbie Coltrane, Alan Cumming, Tchéky Karyo, Desmond Llewelyn, Samantha Bond, Michael Kitchen, Serena Gordon, Simon Kunz, Billy J. Mitchell, Constantine Gregory, Minnie Driver, Michelle Arthur, Ravil Isyanov19956.072311
budgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countDirectoractoresrelease_yearreturn
453350NaN67179.0itSentenced to life imprisonment for illegal activities, Italian International member Giulio Manieri holds on to his political ideals while struggling against madness in the loneliness of his prison cell.0.225051NaNNaN1972-01-01090.0ItalianoReleasedNaNSt. Michael Had a Rooster6.03.0Paolo Taviani, Vittorio TavianiGiulio Brogi, Renato Cestiè, Vito Cipolla, Daniele Dublino19720.0
453360Horror, Mystery, Thriller84419.0enAn unsuccessful sculptor saves a madman named "The Creeper" from drowning. Seeing an opportunity for revenge, he tricks the psycho into murdering his critics.0.222814Universal PicturesUS1946-03-29065.0EnglishReleasedMeet...The CREEPER!House of Horrors6.38.0Jean YarbroughRondo Hatton, Robert Lowery, Virginia Grey, Bill Goodwin, Martin Kosleck, Alan Napier, Howard Freeman, Virginia Christine, Joan Shawlee, Byron Foulger, Syd Saylor19460.0
453370Mystery, Horror390959.0enIn this true-crime documentary, we delve into the murder spree that was the inspiration for Joe Berlinger's "Book of Shadows: Blair Witch 2".0.076061NaNNaN2000-10-22045.0EnglishReleasedNaNShadow of the Blair Witch7.02.0Ben RockTony Abatemarco, Andre Brooks, Mariclare Costello, Bill Dreggors, Apollo Dukakis, Philip Friedman, James Gleason, Dilva Henry, Bari Hochwald, Wendy Hoffman, John Huck, Rachel Moskowitz, Sandy Mulvihill, Roger Nolan, Chris Parnell, Byrne Piven, Richard Sexton, Rich Williams, Ray Xifo20000.0
453380Horror289923.0enA film archivist revisits the story of Rustin Parr, a hermit thought to have murdered seven children while under the possession of the Blair Witch.0.386450Neptune Salad Entertainment, Pirie ProductionsUS2000-10-03030.0EnglishReleasedDo you know what happened 50 years before "The Blair Witch Project"?The Burkittsville 77.01.0Ben RockMonty Bane, Lucy Butler, David Grammer, Bill Dreggors, Frank Pastor, Heather Donahue, Joshua Leonard, Michael C. Williams20000.0
453390Science Fiction222848.0enIt's the year 3000 AD. The world's most dangerous women are banished to a remote asteroid 45 million light years from earth. Kira Murphy doesn't belong; wrongfully accused of a crime she did not commit, she's thrown in this interplanetary prison and left to her own defenses. But Kira's a fighter, and soon she finds herself in the middle of a female gang war; where everyone wants a piece of the action... and a piece of her! "Caged Heat 3000" takes the Women-in-Prison genre to a whole new level... and a whole new galaxy!0.661558Concorde-New HorizonsUS1995-01-01085.0EnglishReleasedNaNCaged Heat 30003.51.0Aaron OsborneLisa Boyle, Kena Land, Zaneta Polard, Don Yanan, Debra K. Beatty, Mark Sikes, Robert J. Ferrelli, Ellyn Dawn Humphreys, Ron Jeremy, Ben Ramsey19950.0
453400Drama, Action, Romance30840.0enYet another version of the classic epic, with enough variation to make it interesting. The story is the same, but some of the characters are quite different from the usual, in particular Uma Thurman's very special maid Marian. The photography is also great, giving the story a somewhat darker tone.5.683753Westdeutscher Rundfunk (WDR), Working Title Films, 20th Century Fox Television, CanWest Global CommunicationsCA, DE, GB, US1991-05-130104.0EnglishReleasedNaNRobin Hood5.726.0John IrvinPatrick Bergin, Uma Thurman, David Morrissey, Jürgen Prochnow, Jeroen Krabbé19910.0
453410Drama111109.0tlAn artist struggles to finish his work while a storyline about a cult plays in his head.0.178241Sine OliviaPH2011-11-170360.0NaNReleasedNaNCentury of Birthing9.03.0Lav DiazAngel Aquino, Perry Dizon, Hazel Orencio, Joel Torre, Bart Guingona, Soliman Cruz , Roeder, Angeli Bayani, Dante Perez, Betty Uy-Regala, Modesta20110.0
453420Action, Drama, Thriller67758.0enWhen one of her hits goes wrong, a professional assassin ends up with a suitcase full of a million dollars belonging to a mob boss ...0.903007American World PicturesUS2003-08-01090.0EnglishReleasedA deadly game of wits.Betrayal3.86.0Mark L. LesterErika Eleniak, Adam Baldwin, Julie du Page, James Remar, Damian Chapa, Louis Mandylor, Tom Wright, Jeremy Lelliott, James Quattrochi, Jason Widener, Joe Sabatino, Kiko Ellsworth, Don Swayze, Peter Dobson, Darrell Dubovsky20030.0
453430NaN227506.0enIn a small town live two brothers, one a minister and the other one a hunchback painter of the chapel who lives with his wife. One dreadful and stormy night, a stranger knocks at the door asking for shelter. The stranger talks about all the good things of the earthly life the minister is missing because of his puritanical faith. The minister comes to accept the stranger's viewpoint but it is others who will pay the consequences because the minister will discover the human pleasures thanks to, ehem, his sister- in -law… The tormented minister and his cuckolded brother will die in a strange accident in the chapel and later an infant will be born from the minister's adulterous relationship.0.003503YermolievRU1917-10-21087.0NaNReleasedNaNSatan Triumphant0.00.0Yakov ProtazanovIwan Mosschuchin, Nathalie Lissenko, Pavel Pavlov, Aleksandr Chabrov, Vera Orlova19170.0
453440NaN461257.0en50 years after decriminalisation of homosexuality in the UK, director Daisy Asquith mines the jewels of the BFI archive to take us into the relationships, desires, fears and expressions of gay men and women in the 20th century.0.163015NaNGB2017-06-09075.0EnglishReleasedNaNQueerama0.00.0Daisy AsquithNaN20170.0